CN101763392A - Retrieval architecture and retrieval method - Google Patents

Retrieval architecture and retrieval method Download PDF

Info

Publication number
CN101763392A
CN101763392A CN200810241856A CN200810241856A CN101763392A CN 101763392 A CN101763392 A CN 101763392A CN 200810241856 A CN200810241856 A CN 200810241856A CN 200810241856 A CN200810241856 A CN 200810241856A CN 101763392 A CN101763392 A CN 101763392A
Authority
CN
China
Prior art keywords
index
index server
module
resource
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810241856A
Other languages
Chinese (zh)
Inventor
雷凯
李晓明
徐阳
康泽宇
李挥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN200810241856A priority Critical patent/CN101763392A/en
Publication of CN101763392A publication Critical patent/CN101763392A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a retrieval architecture which comprises a resource collection module, an index creation module, an inquiry service module and a retrieval service module, wherein the index creation module establishes indexes for at least two types of resources in p2p resources, ftp resources and http resources; the retrieval service module comprises the corresponding at least two of a p2p index server, an ftp index server and an http index server, as well as a gateway module, and the gateway module is used for receiving an inquiry request, assigning to all the index servers, receiving and integrating inquiry results of all the index servers and returning to the inquiry service module. The invention further discloses a retrieval method used for the retrieval architecture. The retrieval architecture and the retrieval method can integrate the three types of search together, absorb the advantages of the three types of search engines and make up for the respective deficiencies, thereby providing a resource-abundant, stable and high-speed downloading platform for users.

Description

Retrieval architecture and search method
Technical field
The present invention relates to internet hunt, especially relate to a kind of retrieval architecture and search method.
Background technology
Present search engine kind has a lot, it can be divided into two classes on purposes: a class is to obtain the search engine that information is task, what to be the user obtained by search is satisfactory webpage, the user uses the purpose of this search engine often for the needs that obtain to information, and most of behavior of user is a browsing page; Another kind of is to obtain the search engine that resource is a task, often purpose is very clear and definite to use the user of this class search engine, to download the resource that oneself needs exactly, such as video, audio frequency, e-book, software etc., the user expects to obtain the resource of oneself wanting and carry out the fast data transmission by this class search engine searches most.Fig. 1 has described the general architecture of existing search engine.Wherein, web crawlers crawler is initial by some seed websites, and unlimited increment is climbed and got corresponding Internet resources, comprises downloaded resources or web page files, and deposits these resources in high-volume database with " url-〉text document " such structure.The server I ndexer that sets up index regularly utilizes the delta file in the high-volume database, sets up " key word-url " such index file.Index server Index Server, the user client1 that it utilizes CGI (Common Gateway Interface) (Common Gateway Interface is called for short CGI) to transmit ..., the query requests of n is as parameter, and the search index file obtains the result, and returns to cgi script.Cgi script operates on the webserver (web server) (normally apache), the user submits inquiry to by client or web page access, so the webserver calls cgi script, and query argument (promptly having the link of parameter) brought into cgi script, CGI resolves to the structure that Index Server can discern with query argument, to Index Server query requests is proposed, after obtaining the Query Result that Index Server provides, cgi script also can change into corresponding form to the result, submit to the webserver, and represent in client client1 with the form of webpage ..., n.
More than be a general general network search engines framework, and slightly different at the search engine of network download resource.The known mode of obtaining Internet resources is divided into three kinds: by the ftp downloaded resources; Find resource needed from the portal website that downloaded resources is provided, and on the respective server downloaded resources; P2p mode downloaded resources.These three kinds of Internet resources search frameworks are basic identical, and are very big but source resource and resource acquisition mode are distinguished, and below described in detail respectively.
1.ftp mode
The resource of ftp is collected module and mainly is divided into two parts, and ftp station for acquiring and ftp website fileinfo are collected.Ftp station for acquiring module is mainly by the website detection mode.The method that so-called website is surveyed is surveyed one by one to 21 ports of more than 130 network segment on the domestic internet exactly, if effectively, with its income.This method advantage is that the website number of including can be a lot, but shortcoming is to support the anonymous website of 21 ports usually.When having collected after all might become the ftp server site of file in download, the website fileinfo that exactly each server is provided that next will do collects.The contents extraction of website is just all downloaded the directory information of a ftp website and fileinfo get off, and sets up index then.
The mode of collecting by the ftp resource also as can be seen owing to can only support the anonymous website of 21 ports usually, thus include the ftp website of user's resource requirement may be seldom or the user unavailable, therefore, the Internet resources of Huo Deing are limited in this way.
2.http search
A very typical example is exactly the music searching of Baidu, but downloaded resources derives from each portal website, these websites have the private server of oneself that download support is provided, and the work of search engine is exactly to grasp the resource link of collecting these seed websites, and set up index according to the title of resource, for the user provides inquiry service.The advantage of this class search engine is that the Download Server that provides of portal website is very stable, user's search is very effective often, the thing overwhelming majority that searches can be downloaded success, and owing to be to obtain resource from the website that download is provided, can make classified search easily, resource obtain and user's inquiry all has clear and definite directive property, recall rate and precision can well guarantee.
But this search still has many circumscribed.Because the kind of resource has been subjected to very big restriction, so can not satisfy all users' demand; Also because the restriction of the load of the server of downloaded resources, bandwidth, linking number makes speed of download be subjected to very big influence.
3.p2p search
The source resource of p2p search engine is: the user issues the seed of shared resource and gives index module (tracker); Perhaps the online user uploads to LIST SERVER with shared resource.LIST SERVER will resolve to set form (comprising address information of the title of document, md5 sign indicating number, user etc.) to user's resource, and index module is set up index according to these data exactly.The user gives Index Server for the inquiry of certain keyword by CGI, Index Server obtains the initial query result according to index file, simultaneously, shared result by state server obtains user's online information and filters out offline user according to this result returns to the user with such result.The user to the download of some resources be by with being connected of the user who has this resource (whole file or part), piecemeal downloads on the local machine.
Under this pattern, also for other users provide download, do not had the restriction of server load and bandwidth in the time of the user's download resource, the p2p downloading mode provides suitable rich in natural resources for the user, and can be very big for some hot resource speeds of download.When but not online even other all files that the user had sections can't scrabble up whole file when resource kind child user, this file just can't be downloaded, and that is to say that this mode obtains resource for the user very big instability is arranged.
As can be seen, the framework of above-mentioned three kinds of resource searching engines exists respectively that resource quantity is limited, kind is single, speed of download is fast inadequately, download unstable or the like defective.
Summary of the invention
Fundamental purpose of the present invention solves the problems of the prior art exactly, and a kind of retrieval architecture and search method are provided, its can for the download of the abundanter and resource of resource of retrieval more fast, stable.
For achieving the above object, the present invention is by the following technical solutions:
A kind of retrieval architecture comprises:
Resource is collected module, is used to collect the information of Downloadable file on the internet;
The index creation module is used for setting up file index according to collecting the result;
The inquiry service module is used to receive user's query requests and returns Query Result to the user; With
The retrieval service module is used to respond described query requests, utilizes described file index to inquire about accordingly;
It is characterized in that, the index of described file index at least two class resources in p2p resource, ftp resource and the http resource are set up, described retrieval service module comprises in p2p index server, ftp index server and the http index server accordingly at least both, also comprise gateway module, be used to receive described query requests and be assigned to each described index server, and receive, integrate the Query Result of each described index server and return to described inquiry service module.
Preferably:
Described inquiry service module comprises the webserver and the CGI (Common Gateway Interface) that operates on the described webserver, described query requests is delivered to described index server after described CGI (Common Gateway Interface) is resolved, described Query Result is submitted to the described webserver after CGI (Common Gateway Interface) is handled, the described webserver represents described Query Result with form web page to the user.
Described gateway module comprises:
Dispose initialization unit, be used for the configuration file of the described gateway module of initialization;
The index server initialization unit is used for the correlation parameter of the described index server of initialization, comprises IP, port and the type of index server;
Connect and to set up the unit, be used to set up with described CGI (Common Gateway Interface) with being connected of described index server; And
The inquiry start unit is used for being enabled in respective queries on the described index server according to query requests.
Described connection comprises the connection of setting up by the socket mode.
Described inquiry start unit comprises the thread creation unit, is used to each query requests to create one in order to finish when time thread of inquiry.
A kind of search method is characterized in that, may further comprise the steps:
The associated documents information of at least two class resources in p2p resource, ftp resource and the http resource on A, the collection internet;
B, set up and the corresponding file index of resource class according to collecting the result;
C, reception user's query requests;
D, respond described query requests, utilize corresponding file index to carry out corresponding p2p formula or ftp formula or the retrieval of http formula respectively;
E, integrate Query Result and it is returned to the user.
Preferably:
Among the described step C, the described query requests CGI (Common Gateway Interface) on the webserver is earlier resolved, be assigned to each index server by gateway module then, among the described step D, the keyword that each index server utilization parses carries out the inquiry corresponding to type of server, and each Query Result is sent by described gateway module integration; In the described step e, the Query Result through integrating is back to described gateway module earlier, sends into described CGI (Common Gateway Interface) again and handles, and is represented to the user with form web page by the described webserver then.
Also comprise the steps: before the described step D
D1, initialization are about the configuration file of described gateway module;
The correlation parameter of D2, the described index server of initialization, it comprises IP, port and the type of described index server;
D3, set up being connected of described gateway module and described CGI (Common Gateway Interface) and described index server;
Described step D comprises the steps:
D4, described gateway module are enabled in respective queries on the described index server according to query requests.
Among the described step D3, set up described connection by the socket mode.
Among the described step D4, create a thread respectively for each query requests of described query request and finish inquiry.
Beneficial technical effects of the present invention is:
Retrieval architecture of the present invention comprises resource collection module, index creation module, inquiry service module and retrieval service module, wherein, the index creation module is set up index at least two class resources in p2p resource, ftp resource and the http resource, the retrieval service module comprises in p2p index server, ftp index server and the http index server accordingly both and gateway module at least, gateway module receives query requests and is assigned to each index server, and receives, integrates the Query Result of each index server and return to the inquiry service module.Like this, retrieval architecture of the present invention and its applied search method have been improved traditional single search engine framework, p2p search for, ftp search, http search integrate, can utilize unified interface routine to realize query and search as CGI, the inquiry inlet that is to say the user only needs one, the result of inquiry then derives from three aspects, just p2p, ftp and http three's Search Results.Therefore, adopt the present invention, absorb the advantage of three kinds of search engines and remedy separately deficiency, can be the user aboundresources, stable, download platform at a high speed are provided.Three kinds of search are combined, also alleviated system burden, and can carry out complicated more processing for the content of inquiry and the result of inquiry.
Description of drawings
Fig. 1 has showed the general architecture of traditional search engines;
Fig. 2 has showed the general frame according to the retrieval architecture of an embodiment of the present invention;
Fig. 3 has showed the enforcement principle according to gateway module in an embodiment of the present invention;
Fig. 4 has showed the workflow according to gateway module in an embodiment of the present invention;
Fig. 5 has showed the arthmetic statement of the Thread ConnectServers among Fig. 4;
Fig. 6 has showed the process of the Creat commitSearch among Fig. 4;
Fig. 7 has showed the process of finishing inquiry among Fig. 6 by thread;
Fig. 8 has showed the implementation of commitSearch in a thread;
Fig. 9 has showed the flow process according to the search method of an embodiment of the present invention;
Figure 10 has showed in the search method of a kind of embodiment the treatment step about gateway module.
Feature of the present invention and advantage will be elaborated in conjunction with the accompanying drawings by embodiment.
Embodiment
Please refer to Fig. 2, retrieval architecture comprises resource collection module, index creation module, retrieval service module and web inquiry service module.Wherein, the index creation module is set up index at least two class resources in p2p resource, ftp resource and the http resource, and the retrieval service module comprises in p2p index server, ftp index server and the http index server accordingly at least both.In a kind of preferred embodiment, the index creation module is set up three class index about p2p resource, ftp resource and http resource, corresponding p2p, ftp and the three kinds of index servers of http of comprising of retrieval service module.
Web inquiry service module comprises web server and the CGI (Common Gateway Interface) CGI that operates on the web server, and CGI (Common Gateway Interface) CGI provides query interface mutual between user and the system.The retrieval service module comprises gateway module GateWay, gateway module GateWay is responsible for monitoring the query requests that transmits from CGI (Common Gateway Interface) CGI, and the query requests branch that listens to tasked each index server, receive and gather the Query Result of each index server, and by returning to client after the CGI (Common Gateway Interface) CGI processing.
The retrieval architecture basic functional principle is as follows:
The user is by client (for example Maze client) inquiry resource requirement, and the query requests that comprises searching keyword is delivered to gateway module GateWay by CGI (Common Gateway Interface) CGI;
Gateway module GateWay listens to query requests, and the query requests branch is tasked each index server;
Each index server according to searching keyword, utilize corresponding file index to carry out resource query, and the Query Result that obtains is returned to gateway module GateWay;
Gateway module GateWay gathers, integrates the return results of each index server, and the gained data are submitted to CGI (Common Gateway Interface) CGI, returns to client via the web server at last.
Fig. 3 has showed the enforcement schematic diagram that utilizes gateway module GateWay to realize inquiry, and it preferably adopts the C Plus Plus programming to realize.Specify as follows:
Client---promptly submit the client of searching for to;
Search Model---cgi script is used to receive the query requests of client;
InitializeGateWay---be used for the program of initialization GateWay system, as initialization conf file, log file etc.Its function can realize by the InitSetup () in the main () in the realCGI.cpp file, buildBadMD5List (), initServers () and the Identify.cpp file;
RevSearchRequset---be used to receive the query requests that transmits from Search Model.Its function can realize by the main () in the realCGI.cpp file, commitSearch ();
Commitsearch---be used for moving polling routine at server according to query requests;
IndexSvr 0-4, Ftp Server, HTML Server---be respectively 4 p2p index servers, a Ftp index server and a http index server, inquiry service separately is provided;
CollectResult---collect the Query Result that from Servers-all, returns, submit to the user.Its function can realize by the main () in the realCGI.cpp file, commitSearch ().
As shown in Figure 4, the basic procedure of gateway module GateWay work is as follows:
-Initialize?Configure
The initial configuration file is opened the text with configuration file content, and configuration file content is read among the buffer zone InitSetupBuffer, to wait for the analysis of back program.On the specific implementation, two InitSetup functions are arranged in Identify.cpp preferably, one has parameter c har*path, one does not have parameter, the latter is that parameter has directly been called the former with NULL, so in the initialization of GateWay, configuration file has directly used the bingle.conf file.
-build?BadMD5List
In a preferred embodiment, the list of setting up Bad MD5 uses when offering the follow-up inquiry that relates to MD5.Specific practice is that the content among the badmd5list.txt is read among the file input stream tmpif, then the content among the tmpif is set up badMD5List.In subsequent step, during for example below with the CheckBadMD5 that addresses, the MD5 of file that can judge the request inquiry earlier and selects to carry out next step action according to judged result whether in the list of Bad MD5.
-Init?Servers
The correlation parameter of the Index server that initialization can be used comprises ip, port, the type of server.Preferably the data among the InitSetipBuffer (wherein be bingle.conf among content) are write in the array of a SearchServer class, servers[h] .Host, servers[h] .Port, servers[h] .Version is written into the content corresponding of serverh among the bingle.conf, and array is indicated by pointer servers.
-Creat?Thread?ConnectServers
Foundation is connected with a plurality of IndexSvr, Ftp Server, HTML Server's.Concrete workflow please refer to Fig. 5.Wherein, servernum the SearchServer class of having built together in the program, corresponding servernum the server that inquiry is provided.Each SearchServer class has a socket (socket) to handle the code array, it comprises socknum element, GateWay therefrom finds out sockFlag[i]=first i of-1, apply for that a new socket handles code newfd, index server (IndexSvr, Ftp Server or HTML Server) with it and this SearchServer class correspondence connects, and with sockFD[i] be set to newfd; IndexSvr among all servers is done this work.
Utilize select () function, the execution that allows to block gateway module GateWay provides service for each index server again when needs gateway module GateWay, and the waitcount technology is adopted in the query task formation, like this, gateway module GateWay can more effectively handle the situation to a plurality of index server inquiries.
-Creat?commitSearch
Please refer to Fig. 6, commitSearch specifically can be a function, and its effect is to take out an element in the query request and finish corresponding inquiry.This step preferably comprises following process again:
--Prepare Socket for search request (being Prepare Socket for CGIConnect)
Adopt the socket mode to set up and being connected of cgi script.
--Accept?and?Push?Search?request?into?mClientSock
User's query requests is submitted to cgi script, gateway module GateWay extracts query requests from cgi script, if extract the query requests success, just query requests is put into oneself the query request of GateWay, the socket of these requests soon puts into a formation mClientSock.
--Creat?Thread?commitSearch
The implication of Creat Thread commitSearch is, creates a thread and carries out by the commitSearch function, that is to say that all setting up a thread for each query task finishes inquiry, has so just realized multithreading operation.
As shown in Figure 6, in case setting up good is being connected of CGI module with polling routine, just monitor cgi script the zero hour, in case there is a query requests to enter, (this formation is a shared data zone just query requests to be put into query request, share by a plurality of CommitSearch threads), and create a CommitSearch thread and finish this inquiry, just all create a thread and finish inquiry for each inquiry.In a kind of preferred embodiment, the processing procedure of finishing inquiry by thread as shown in Figure 7, gateway module GateWay comprises to several results that cgi script returns: total Query Result is counted resultnum, is counted the data length outputlen and the Query Result data data of shownum, Query Result to user's result displayed.
As shown in Figure 8, the implementation of commitSearch function in a thread is as follows:
At first, from formation mClientSock, take out pending socket such as and handle code (carrying out mClientSock.pop ()), reception query requests (carrying out Recv request) also deposits in the SearchStruct structure, whether if the MD5 inquiry, preferably calling checkBadMD5 () function earlier is Bad MD5.Then, gateway module GateWay communicates by letter with index server with CGI respectively, will send to from the user inquiring request of CGI in servernum the index server and (promptly carry out for (i=0; I<servernum; I++) this section operation).Then, gateway module GateWay receives the Query Result that returns from each index server and these results is integrated.At last, the Query Result that will integrate of gateway module GateWay is submitted to the user by CGI.
According to a further aspect in the invention, also provide a kind of search method that is used for retrieval architecture of the present invention.As shown in Figure 9, preferred embodiment comprises following treatment step:
The associated documents information of p2p resource, ftp resource and http resource on steps A, the collection internet;
Step B, set up and the corresponding file index of resource class according to collecting the result;
Step C, reception user's query requests;
Step D, response query requests utilize corresponding file index to carry out corresponding p2p formula or ftp formula or the retrieval of http formula respectively;
Step e, integrate Query Result and it is returned to the user.
In the preferred embodiment, at step C, the query requests CGI (Common Gateway Interface) on the webserver is earlier resolved, be assigned to each index server by gateway module then, at step D, the keyword that each index server utilization parses carries out the inquiry corresponding to type of server, and each Query Result is sent by gateway module integrate; In step e, the Query Result through integrating is back to gateway module earlier, sends into CGI (Common Gateway Interface) again and handles, and is represented to the user with form web page by the webserver then.
As shown in figure 10, preferably, also comprise following treatment step before the step D:
Initialization is about the configuration file of gateway module;
The correlation parameter of initialization index server, it comprises IP, port and the type of index server;
Set up being connected of gateway module and index server, preferably, connect by the socket mode;
Among the step D, gateway module is enabled in respective queries on the index server according to query requests.This step preferably comprises following treatment step again:
Set up being connected of gateway module and CGI (Common Gateway Interface) in the socket mode;
Gateway module extracts query requests from CGI (Common Gateway Interface), if extract the query requests success, then with the query request of putting into of query requests;
Create a thread respectively for each query requests of query request and finish inquiry.
Can be about the more detailed content of search method of the present invention with reference to the principle of the embodiment of retrieval architecture of the present invention and the explanation of the course of work.
According to a preferred embodiment of the invention, gateway module GateWay is set, has advantageously realized the unification of p2p search, ftp search and http search, but can obtain the result of three kinds of search for unique inquiry inlet of user.It has following remarkable advantage:
1. because p2p searches for, the existence of http search, remedied the shortcoming of the inadequate resource of simple ftp search.
2. because the existence of p2p search and ftp search has remedied the http search owing to be subjected to the not high shortcoming of speed of download that restrictions such as linking number, bandwidth, server load cause.
3. because the existence of ftp search, http search, remedied the p2p search for the resource of unexpected winner comparatively, download unsettled shortcoming.
4. can also add complicated more work of treatment among the gateway module GateWay easily to Search Results.If the inquiry each time for each user all will start a CGI service processes, and the characteristic of process causes the system resource occupancy very big, can seriously cause system load too high for a large amount of requests.Introduce gateway module GateWay, processing for each query requests starts a thread, then saved system resource greatly, also make and to carry out complicated more processing to inquiry and Query Result in this module, such as for the filtration of sensitive word, for the merger ground of a plurality of mirror images of resource etc.
Need point out, index creation module and retrieval service module are interrelated, because three kinds of retrievals will be unified, so the Query Result form of the inlet of search index and output should be unified, this point also will be reflected in the index creation process, thereby satisfies the unified inquiry based on different file indexes.
In addition, search for three parts although in a kind of preferred embodiment of the present invention, comprise p2p search, ftp search and http, should be appreciated that, in other embodiment of the present invention, p2p can also be searched for, both integrate the formation retrieval architecture arbitrarily in the ftp search, http search, can utilize unified realization to retrieve equally, the result of two kinds of retrievals is provided for the user as cgi script.
Above content be in conjunction with concrete preferred implementation to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (10)

1. retrieval architecture comprises:
Resource is collected module, is used to collect the information of Downloadable file on the internet;
The index creation module is used for setting up file index according to collecting the result;
The inquiry service module is used to receive user's query requests and returns Query Result to the user; With
The retrieval service module is used to respond described query requests, utilizes described file index to inquire about accordingly;
It is characterized in that, the index of described file index at least two class resources in p2p resource, ftp resource and the http resource are set up, described retrieval service module comprises in p2p index server, ftp index server and the http index server accordingly at least both, also comprise gateway module, be used to receive described query requests and be assigned to each described index server, and receive, integrate the Query Result of each described index server and return to described inquiry service module.
2. retrieval architecture as claimed in claim 1, it is characterized in that, described inquiry service module comprises the webserver and the CGI (Common Gateway Interface) that operates on the described webserver, described query requests is delivered to described index server after described CGI (Common Gateway Interface) is resolved, described Query Result is submitted to the described webserver after CGI (Common Gateway Interface) is handled, the described webserver represents described Query Result with form web page to the user.
3. retrieval architecture as claimed in claim 2 is characterized in that, described gateway module comprises:
Dispose initialization unit, be used for the configuration file of the described gateway module of initialization;
The index server initialization unit is used for the correlation parameter of the described index server of initialization, comprises IP, port and the type of index server;
Connect and to set up the unit, be used to set up with described CGI (Common Gateway Interface) with being connected of described index server; And
The inquiry start unit is used for being enabled in respective queries on the described index server according to query requests.
4. retrieval architecture as claimed in claim 3 is characterized in that described connection comprises the connection of setting up by the socket mode.
5. retrieval architecture as claimed in claim 3 is characterized in that, described inquiry start unit comprises the thread creation unit, is used to each query requests to create one in order to finish when time thread of inquiry.
6. a search method is characterized in that, may further comprise the steps:
The associated documents information of at least two class resources in p2p resource, ftp resource and the http resource on A, the collection internet;
B, set up and the corresponding file index of resource class according to collecting the result;
C, reception user's query requests;
D, respond described query requests, utilize corresponding file index to carry out corresponding p2p formula or ftp formula or the retrieval of http formula respectively;
E, integrate Query Result and it is returned to the user.
7. search method as claimed in claim 6, it is characterized in that, among the described step C, the described query requests CGI (Common Gateway Interface) on the webserver is earlier resolved, be assigned to each index server by gateway module then, among the described step D, the keyword that each index server utilization parses carries out the inquiry corresponding to type of server, and each Query Result is sent by described gateway module integration; In the described step e, the Query Result through integrating is back to described gateway module earlier, sends into described CGI (Common Gateway Interface) again and handles, and is represented to the user with form web page by the described webserver then.
8. search method as claimed in claim 7 is characterized in that, also comprises the steps: before the described step D
D1, initialization are about the configuration file of described gateway module;
The correlation parameter of D2, the described index server of initialization, it comprises IP, port and the type of described index server;
D3, set up being connected of described gateway module and described CGI (Common Gateway Interface) and described index server;
Described step D comprises the steps:
D4, described gateway module are enabled in respective queries on the described index server according to query requests.
9. search method as claimed in claim 8 is characterized in that, among the described step D3, sets up described connection by the socket mode.
10. search method as claimed in claim 9 is characterized in that, among the described step D4, creates a thread respectively for each query requests of described query request and finishes inquiry.
CN200810241856A 2008-12-23 2008-12-23 Retrieval architecture and retrieval method Pending CN101763392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810241856A CN101763392A (en) 2008-12-23 2008-12-23 Retrieval architecture and retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810241856A CN101763392A (en) 2008-12-23 2008-12-23 Retrieval architecture and retrieval method

Publications (1)

Publication Number Publication Date
CN101763392A true CN101763392A (en) 2010-06-30

Family

ID=42494556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810241856A Pending CN101763392A (en) 2008-12-23 2008-12-23 Retrieval architecture and retrieval method

Country Status (1)

Country Link
CN (1) CN101763392A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020300A (en) * 2012-12-28 2013-04-03 杭州华三通信技术有限公司 Method and device for information retrieval
CN103309897A (en) * 2012-03-15 2013-09-18 深圳瓶子科技有限公司 Firmware publishing method and system
CN105701231A (en) * 2016-01-20 2016-06-22 深圳市迅雷网络技术有限公司 Network resource search system and method
CN107665203A (en) * 2016-07-27 2018-02-06 北京京东尚科信息技术有限公司 Method, apparatus and system for application retrieval more
CN108460084A (en) * 2018-01-18 2018-08-28 大象慧云信息技术有限公司 Company information fuzzy query method and system, computer equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309897A (en) * 2012-03-15 2013-09-18 深圳瓶子科技有限公司 Firmware publishing method and system
CN103020300A (en) * 2012-12-28 2013-04-03 杭州华三通信技术有限公司 Method and device for information retrieval
CN103020300B (en) * 2012-12-28 2017-04-12 杭州华三通信技术有限公司 Method and device for information retrieval
CN105701231A (en) * 2016-01-20 2016-06-22 深圳市迅雷网络技术有限公司 Network resource search system and method
CN105701231B (en) * 2016-01-20 2018-04-20 深圳市迅雷网络技术有限公司 Internet resources search system and method
CN107665203A (en) * 2016-07-27 2018-02-06 北京京东尚科信息技术有限公司 Method, apparatus and system for application retrieval more
CN108460084A (en) * 2018-01-18 2018-08-28 大象慧云信息技术有限公司 Company information fuzzy query method and system, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US9785714B2 (en) Method and/or system for searching network content
CN107087001B (en) distributed internet important address space retrieval system
AU2001290363B2 (en) A method for searching and analysing information in data networks
CN100557603C (en) The more method of new database and server and file sharing network system
US6694307B2 (en) System for collecting specific information from several sources of unstructured digitized data
CN101320373B (en) Safety search engine system of website database
CN102761627B (en) Based on cloud network address recommend method and system and the relevant device of terminal access statistics
CN102710795B (en) Hotspot collecting method and device
US20090222426A1 (en) Computer-Implemented System And Method For Analyzing Search Queries
CN102073683A (en) Distributed real-time news information acquisition system
CN103389983A (en) Webpage content grabbing method and device applied to network crawler system
CN109101607B (en) Method, apparatus and storage medium for searching blockchain data
CN101763392A (en) Retrieval architecture and retrieval method
CN101211340A (en) Dynamic network crawler based on client end /service end
US10491606B2 (en) Method and apparatus for providing website authentication data for search engine
CN108154024B (en) Data retrieval method and device and electronic equipment
Lee et al. An effective approach to enhancing a focused crawler using Google
CN102622402B (en) Server, method and system for providing information search service by using sheaf of pages
CN104636368A (en) Data retrieval method and device and server
CN112597369A (en) Webpage spider theme type search system based on improved cloud platform
CN105930385A (en) Data crawling method and system
Langhnoja et al. Web usage mining to discover visitor group with common behavior using DBSCAN clustering algorithm
US20080086476A1 (en) Method for providing news syndication discovery and competitive awareness
Elfirdoussi et al. Popularity based web service search
KR100633534B1 (en) Web scrapping engine system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100630