CN103902610A - Searching method and searching device - Google Patents

Searching method and searching device Download PDF

Info

Publication number
CN103902610A
CN103902610A CN201210583885.3A CN201210583885A CN103902610A CN 103902610 A CN103902610 A CN 103902610A CN 201210583885 A CN201210583885 A CN 201210583885A CN 103902610 A CN103902610 A CN 103902610A
Authority
CN
China
Prior art keywords
retrieval
search
time
search logic
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210583885.3A
Other languages
Chinese (zh)
Inventor
侯志远
梁肖
于晓明
杨建武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University
Priority to CN201210583885.3A priority Critical patent/CN103902610A/en
Publication of CN103902610A publication Critical patent/CN103902610A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a searching method. The searching method includes displaying a meta search input window on a client side to acquire search logic input by a user; by a meta search engine, determining whether or not the search logic is retrieved and does not exceed a preset value since last retrieval time; when determining that the search logic is not retrieved and does not exceed the preset value, acquiring an existing retrieval result from a cache; returning the retrieval result to the client side. The invention further provides a searching device. The searching device comprises a window module, a judging module, a caching module and a returning module. The window module is used for displaying the meta search input window on the client side to acquire the search logic input by the user. The judging module is used for determining whether or not the search logic is retrieved and does not exceed the preset value since the last retrieval time by the meta search engine. The caching module is used for acquiring the existing retrieval result from the cache when the search logic is not retrieved and does not exceed the preset value are determined. The returning module is used for returning the retrieval result to the client side. By the searching method and the searching device, searching speed is increased.

Description

Searching method and device
Technical field
The present invention relates to search field, in particular to searching method and device.
Background technology
META Search Engine claims again multiple search engine, helping user in multiple search engines, select and utilize suitable (or even simultaneously utilizing several) search engine to realize search operaqtion by a unified user interface, is the global control mechanism of the multiple gopher to being distributed in network.META Search Engine is made up of three parts: retrieval request Committing Mechanism, Retrieval Interface agency mechanism, result for retrieval display mechanism.The retrieval that " request submit to " is responsible for realizing user individual arranges requirement, comprises and calls which search engine, restriction retrieval time, fruiting quantities restriction etc." interface proxy " is responsible for user's retrieval request to translate into and meet the form that different search engine localizations require." result demonstration " is responsible for the duplicate removal, merging, output processing of all source search engine result for retrieval etc.Use META Search Engine several search engines to be retrieved simultaneously, obtain the result for retrieval of classification layout.
Unit's search utilizes existing search engine can bring the result of the search engine of wanting, and does personal settings search according to the hobby of oneself, as the weight of different search engines is set, and the result obtaining also can be different.Be with great convenience, user has been freed from single search.
Once, the search engine that first search will calling party be wanted once even repeatedly for the search of the every access of user unit.Now take Baidu to give an example, if when first search subscriber quantity reaches certain rank, thousands of user uses unit's search simultaneously, and unit's search can be to search engine build-ups of pressure such as Baidu, allow it may think that unit's search is sealed for malicious attack by mistake, affect user's normal use.
Summary of the invention
The present invention aims to provide searching method and device, to solve the above problems.
In an embodiment of the present invention, provide a kind of searching method, having comprised: in client, presented unit's search input window to obtain the search logic of user's input; META Search Engine determines whether search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time; In the time being defined as not doing retrieval and not exceeding preset value, from buffer memory, obtain existing result for retrieval; Result for retrieval is returned to client.
In an embodiment of the present invention, provide a kind of searcher, having comprised: window module, for presenting unit's search input window to obtain the search logic of user's input in client; Judge module, determines for META Search Engine whether search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time; Cache module in the time being defined as not doing retrieval and not exceeding preset value, obtains existing result for retrieval from buffer memory; Return to module, for result for retrieval is returned to client.
The searching method of the above embodiment of the present invention and device, because adopt buffer memory to hit search logic, so reduced the calling of search engine, have improved search speed.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 shows according to the process flow diagram of the searching method of the embodiment of the present invention;
Fig. 2 shows the process flow diagram of searching method according to the preferred embodiment of the invention;
Fig. 3 shows according to the block scheme of the searcher of the embodiment of the present invention.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
Fig. 1 shows according to the process flow diagram of the searching method of the embodiment of the present invention, comprising:
Step S10 presents unit's search input window to obtain the search logic of user's input in client;
Step S20, META Search Engine determines whether search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time;
Step S30 in the time being defined as not doing retrieval and not exceeding preset value, obtains existing result for retrieval from buffer memory;
Step S40, returns to client by result for retrieval.
In use user's certain word of repeated retrieval within a certain period of time, as " China ".And same word can not upgrade in the Search Results section time in search engine.Search engine web site upgrades the retrieval timing of keyword, and renewal frequency is comparatively slow, and in homepage more under news, the probability that inferior page also and then upgrades is 99%.This method is opened up a block cache in unit's search, and for preserving the result of user search, user, in the time of this search identical content, inquires user and wants the content of searching for from buffer memory, just the content in buffer memory is directly returned to user.The unnecessary load pressure that this causes search engine can reduce user's repeat search word time, thus the risk that META Search Engine place machine is sealed reduced.This is than decorrelation search engine retrieving is fast again simultaneously.
Aspect the utilization of resources, this method is to exchange Internet resources for own resource, and Internet resources are limited and uncontrollable, and local resource is controlled, thinks that thus this displacement is worth.
Preferably, this method also comprises: META Search Engine determines that search logic do not do retrieval, or apart from having exceeded preset value the retrieval time of last time; With each search engine of rule invocation of presetting, search logic is retrieved; Result for retrieval is returned to client and refreshes be saved in buffer memory, and record searching logic is retrieved and retrieval time in index.
This is cached with expired detection, if when user uses unit's search retrieval, first unit's search just can detect corresponding buffer memory and whether cross the term of validity, if crossed the term of validity, decorrelation search engine retrieving, and result for retrieval is updated in buffer memory, if there is no the term of validity, think that buffered results is effective.This can guarantee the accuracy rate of search.
Preferably, buffer memory adopts Hash (hash) to show the index of the result for retrieval of preserving as it, META Search Engine passes through search logic in index, to determine whether search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time.Hash table is conventional data structure, realizes than being easier to.Buffer memory is designed to Hash table structure, and its time complexity is o (n).
Preferably, in index, record searching logic has been retrieved and comprise retrieval time: search word, search engine title and the search-type do not done in the search logic of retrieval are formed to character string; The MD5 value of calculating character string; Keyword using MD5 value as Hash table, forms an index record, and adds retrieval time in index record.MD5 value is conventional keyword generation method, realizes than being easier to.
Preferably, in index, record searching logic has been retrieved and comprise retrieval time: by exceed current retrieval time of search logic of preset value apart from the retrieval time of last time, join in its corresponding index record.
Whether preferably, META Search Engine determines whether search logic had done retrieval, and comprise apart from not exceeding preset value the retrieval time of last time: the MD5 value of calculating the search logic of obtaining; With the MD5 value search index calculating; If retrieve index record, the difference of the retrieval time in retrieval time and the index record of the search logic of further relatively obtaining, and judge whether difference does not exceed preset value.
In a preferred embodiment of the invention, the design of Hash table is as follows:
Can use search word, search engine title, search-type (webpage or picture and so on) the composition character string in search logic, get the keyword of its MD5 value as Hash table, the basic like this record (can be described as again entry or node) that can guarantee not have repetition.
Each node can be stored original web page, searches and hits analyzing web page again.If do not hit node, can save analyzing step, more directly go to download, if hit node, need repeated resolution webpage.
Each node also can be stored the data after webpage is resolved, and searches direct return data while hitting.If do not hit, waste one parsing, if hit, need not resolve again, can directly from node, data be taken away to user and be returned.
In Hash table, element is determined by hash function.Using the key word K of data element as independent variable, by certain funtcional relationship (being called hash function), the value calculating, is the memory address of this element.Be expressed as: Addr=H (key).
Can construct a suitable hash function, the value of H (key) is evenly distributed in Hash table, to improve the speed of address computation.
In Hash table, may occur that different key values corresponds to the phenomenon of same memory location.Be key word K1 ≠ K2, but H (K1)=H (K2).Hash function can reduce conflict uniformly, but can not avoid conflict.After clashing, must solve; Also must find next available address.The method managing conflict has: cascade synthesis, open addressing method, bucket addressing method.
The building method of Hash table has: immediately allocating method, digital analysis method, middle square method, jackknife method, leaving remainder method, random number method.Wherein random number method is to select a random function, the Hash address that the random function value of getting key word is it, i.e. and H (key)=random (key), wherein random is random function.Here choose random number method as hash function, reason is that MD5 value has approached random number substantially, so can again utilize its randomness.
The process of establishing of Hash table is as follows:
Step1. take out the key word key of a data element, calculate its memory address D=H (key) in Hash table.If it is occupied that the storage space that memory address is D does not also have, this data element is deposited in; Otherwise clash, carry out Step2.
Step2. conflict processing method according to the rules, the next memory address of the data element that to calculate key word be key.If it is occupied that the storage space of this memory address does not have, deposit in; Otherwise continue to carry out Step2, there is no occupied memory address until find out a storage space.
In addition, have the individual requirement of result number and result number display position when user search, if in the effective situation of buffer memory, the number of results in buffer memory can not meet user's requirement, also can continue decorrelation search engine retrieving, and upgrade buffer memory.
Fig. 2 shows the process flow diagram of searching method according to the preferred embodiment of the invention, comprising:
Step S210, the search window of searching for by unit receives the search logic of user's input, the Engine Name that therefrom extracts search word and will call;
Step S212, forms character string by search word and Engine Name, calculates MD5 value;
Step S214 does coupling by MD5 value in the concordance list hash table of buffer memory;
Step S220, if match hit judges whether to submit to the time of search to exceed the effective time of record;
Step S222, if exceed effective time, if or coupling do not hit, the retrieval homepage of removal search engine;
Step S224, and in hash table, corresponding record compares;
Step S226, if comparative result is identical, or does not exceed effective time, and whether judged result number meets user's requirement;
Step S228, if meet user's requirement, returns to user by result;
Step S230, if number of results does not meet user's requirement, continues retrieval and downloads subsequent page;
Step S232, if comparative result is not identical, and if number of results do not meet user's requirement and continue to have retrieved download subsequent page, upgrade the corresponding record in hash table.
Fig. 3 shows according to the block scheme of the searcher of the embodiment of the present invention, comprising:
Window module 10, for presenting unit's search input window to obtain the search logic of user's input in client;
Judge module 20, determines for META Search Engine whether search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time;
Cache module 30 in the time being defined as not doing retrieval and not exceeding preset value, obtains existing result for retrieval from buffer memory;
Return to module 40, for result for retrieval is returned to client.
This device, because adopt buffer memory to hit search logic, so reduced the calling of search engine, has improved search speed.
Preferably, this device also comprises: retrieval module, and determine that for META Search Engine search logic do not do retrieval, or apart from having exceeded preset value the retrieval time of last time, with each search engine of rule invocation of presetting, search logic is retrieved; Update module, be saved in buffer memory, and record searching logic has been retrieved and retrieval time in index for result for retrieval being returned to client and refreshing.
Preferably, the index of the result for retrieval that buffer memory employing Hash table is preserved as it, META Search Engine passes through search logic in index, to determine whether search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time.
Preferably, update module comprises: character string module, for search word, search engine title and the search-type of the search logic of not doing retrieval are formed to character string; MD5 module, for the MD5 value of calculating character string; Logging modle, for the keyword using MD5 value as Hash table, forms an index record, and adds retrieval time in index record.
A concrete application of the embodiment of the present invention is described below.
1, user selects search engine in terminal, and type is done personalized retrieval, following representation case:
Keyword Search engine Type
China Baidu, Google Webpage
Beijing Baidu, Yahoo Picture
2, unit search receives user's request and resolves the request that terminal sends, and obtains fruiting quantities that search word, search engine title, type, user want and the starting position of the initial number of result.
3, use first, in buffer memory inquiry less than, unit search is searched for by normal search routine, and result is filled in buffer memory
4, unit is searched for result by the assembled subscription client that returns to well of certain agreement.
5, user selects search engine in terminal again, and type is done personalized retrieval, following representation case:
Keyword Search engine Type
China Baidu, must answer Webpage
Great Wall Baidu, Yahoo Picture
China Baidu, Google Picture
6, unit's search analysis request obtains search word, search engine, search-type, the starting position of the fruiting quantities that user wants and the initial number of result.
6, judge Baidu's webpage search corresponding to " China " in buffer memory, and the Webpage search of must answering corresponding to " China " there is no in buffer memory, will remove to answer search engine retrieving " China ".
7, the Baidu that assembling keyword " China " is corresponding and the Search Results that must answer, return to user.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on the network that multiple calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in memory storage and be carried out by calculation element, or they are made into respectively to each integrated circuit modules, or the multiple modules in them or step are made into single integrated circuit module to be realized.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a searching method, is characterized in that, comprising:
In client, present unit's search input window to obtain the search logic of user's input;
META Search Engine determines whether described search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time;
In the time being defined as not doing retrieval and not exceeding described preset value, from buffer memory, obtain existing result for retrieval;
Described result for retrieval is returned to described client.
2. method according to claim 1, is characterized in that, also comprises:
Described META Search Engine determines that described search logic do not do retrieval, or apart from having exceeded preset value the retrieval time of last time;
With each search engine of rule invocation of presetting, described search logic is retrieved;
Described result for retrieval is returned to described client and refreshes be saved in described buffer memory, and in described index, record described search logic and retrieved and retrieval time.
3. method according to claim 2, it is characterized in that, the index of the result for retrieval that described buffer memory employing Hash table is preserved as it, described META Search Engine by retrieving described search logic in described index, to determine whether described search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time.
4. method according to claim 3, is characterized in that, records that described search logic has been retrieved and comprise retrieval time in described index:
Search word, search engine title and search-type in the described search logic of not doing retrieval are formed to character string;
Calculate the MD5 value of described character string;
Keyword using described MD5 value as described Hash table, forms an index record, and adds described retrieval time in described index record.
5. method according to claim 4, is characterized in that, records that described search logic has been retrieved and comprise retrieval time in described index:
By the current retrieval time of the described search logic that exceedes preset value the retrieval time apart from last time, join in its corresponding index record.
6. whether method according to claim 5, is characterized in that, META Search Engine determines whether described search logic had done retrieval, and comprise apart from not exceeding preset value the retrieval time of last time:
The MD5 value of the search logic of obtaining described in calculating;
With the described index of MD5 value retrieval of described calculating;
If retrieve index record, the difference of the retrieval time in retrieval time and the described index record of the search logic of further obtaining described in relatively, and judge whether described difference does not exceed described preset value.
7. a searcher, is characterized in that, comprising:
Window module, for presenting unit's search input window to obtain the search logic of user's input in client;
Judge module, determines for META Search Engine whether described search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time;
Cache module in the time being defined as not doing retrieval and not exceeding described preset value, obtains existing result for retrieval from buffer memory;
Return to module, for described result for retrieval is returned to described client.
8. device according to claim 7, is characterized in that, also comprises:
Retrieval module, determines that for described META Search Engine described search logic do not do retrieval, or apart from having exceeded preset value the retrieval time of last time, with each search engine of rule invocation of presetting, described search logic is retrieved;
Update module, is saved in described buffer memory for described result for retrieval being returned to described client and refreshing, and in described index, records described search logic and retrieved and retrieval time.
9. device according to claim 8, it is characterized in that, the index of the result for retrieval that described buffer memory employing Hash table is preserved as it, described META Search Engine by retrieving described search logic in described index, to determine whether described search logic had done retrieval, and whether apart from not exceeding preset value the retrieval time of last time.
10. device according to claim 9, is characterized in that, described update module comprises:
Character string module, for forming character string by the search word of the described search logic of not doing retrieval, search engine title and search-type;
MD5 module, for calculating the MD5 value of described character string;
Logging modle, for the keyword using described MD5 value as described Hash table, forms an index record, and adds described retrieval time in described index record.
CN201210583885.3A 2012-12-28 2012-12-28 Searching method and searching device Pending CN103902610A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210583885.3A CN103902610A (en) 2012-12-28 2012-12-28 Searching method and searching device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210583885.3A CN103902610A (en) 2012-12-28 2012-12-28 Searching method and searching device

Publications (1)

Publication Number Publication Date
CN103902610A true CN103902610A (en) 2014-07-02

Family

ID=50993938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210583885.3A Pending CN103902610A (en) 2012-12-28 2012-12-28 Searching method and searching device

Country Status (1)

Country Link
CN (1) CN103902610A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018120876A1 (en) * 2016-12-29 2018-07-05 北京奇艺世纪科技有限公司 Method and device for searching for cache update
CN110928998A (en) * 2019-12-09 2020-03-27 南开大学 Latin side search engine based on equivalence class representative element index and storage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231636A (en) * 2007-01-25 2008-07-30 北京搜狗科技发展有限公司 Convenient information search method, system and an input method system
CN102214172A (en) * 2010-04-06 2011-10-12 腾讯科技(深圳)有限公司 Caching method and caching equipment
CN102479207A (en) * 2010-11-29 2012-05-30 阿里巴巴集团控股有限公司 Information search method, system and device
CN102693308A (en) * 2012-05-24 2012-09-26 北京迅奥科技有限公司 Cache method for real time search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231636A (en) * 2007-01-25 2008-07-30 北京搜狗科技发展有限公司 Convenient information search method, system and an input method system
CN102214172A (en) * 2010-04-06 2011-10-12 腾讯科技(深圳)有限公司 Caching method and caching equipment
CN102479207A (en) * 2010-11-29 2012-05-30 阿里巴巴集团控股有限公司 Information search method, system and device
CN102693308A (en) * 2012-05-24 2012-09-26 北京迅奥科技有限公司 Cache method for real time search

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018120876A1 (en) * 2016-12-29 2018-07-05 北京奇艺世纪科技有限公司 Method and device for searching for cache update
US11734276B2 (en) * 2016-12-29 2023-08-22 Beijing Qiyi Century Science & Technology Co., Ltd. Method and apparatus for updating search cache to improve the update speed of hot content
CN110928998A (en) * 2019-12-09 2020-03-27 南开大学 Latin side search engine based on equivalence class representative element index and storage
CN110928998B (en) * 2019-12-09 2023-04-14 南开大学 Latin side search engine based on equivalence class representative element index and storage

Similar Documents

Publication Publication Date Title
KR102097881B1 (en) Method and apparatus for processing a short link, and a short link server
US6763362B2 (en) Method and system for updating a search engine
JP5745627B2 (en) Predictive query suggestion cache
US10685017B1 (en) Methods and systems for efficient query rewriting
US20180285470A1 (en) A Mobile Web Cache Optimization Method Based on HTML5 Application Caching
CN102075570B (en) Method for implementing HTTP (hyper text transport protocol) message caching mechanism based on keywords
US20150169601A1 (en) Method and apparatus for storing webpage access records
US9785661B2 (en) Trend response management
JP2013535749A (en) Method for aggressive information push notification and server therefor
US11347815B2 (en) Method and system for generating an offline search engine result page
US7783689B2 (en) On-site search engine for the World Wide Web
US8301841B2 (en) Method and system for caching terminology data
CN109634753B (en) Data processing method, device, terminal and storage medium for switching browser kernels
Singh et al. Enhancing the performance of web proxy server through cluster based prefetching techniques
CN114116827B (en) Query system and method for user portrait data
US8285722B2 (en) Content discovery using gateway browsing data
CN115687810A (en) Webpage searching method and device and related equipment
CN103902610A (en) Searching method and searching device
CN105808636B (en) Hypertext link pushing system based on APP information data
CN113821461B (en) Domain name resolution caching method, DNS server and computer readable storage medium
Mukhopadhyay et al. A dynamic web page prediction model based on access patterns to offer better user latency
Maratea et al. An heuristic approach to page recommendation in web usage mining
Huang et al. A browser-based framework for data cache in web-delivered service composition
CN104392000A (en) Method and device for determining catching quota of mobile station
CN113472914B (en) DNS directional prefetching caching method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140702

RJ01 Rejection of invention patent application after publication