CN107291956A - The system and method for record information is searched in a kind of quick search website - Google Patents

The system and method for record information is searched in a kind of quick search website Download PDF

Info

Publication number
CN107291956A
CN107291956A CN201710633236.2A CN201710633236A CN107291956A CN 107291956 A CN107291956 A CN 107291956A CN 201710633236 A CN201710633236 A CN 201710633236A CN 107291956 A CN107291956 A CN 107291956A
Authority
CN
China
Prior art keywords
keyword
server
record
website
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710633236.2A
Other languages
Chinese (zh)
Other versions
CN107291956B (en
Inventor
温广意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Focus Leading Cloud Computing Technology Co Ltd
Original Assignee
Nanjing Focus Leading Cloud Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Focus Leading Cloud Computing Technology Co Ltd filed Critical Nanjing Focus Leading Cloud Computing Technology Co Ltd
Priority to CN201710633236.2A priority Critical patent/CN107291956B/en
Publication of CN107291956A publication Critical patent/CN107291956A/en
Application granted granted Critical
Publication of CN107291956B publication Critical patent/CN107291956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

The system that record information is searched in a kind of quick search website, record inquiry system is searched including website keyword query system, keyword distribution system, keyword and keyword searches record statistical system, the website keyword query system is after website keyword is got, keyword is distributed to each server by keyword distribution system, the keyword searches record inquiry system according to the keyword in server, and request is sent to search engine;The keyword, which searches record statistical system and concludes and count the keyword got, searches record data, formation keyword statistical report form;Each described server includes application server and proxy server, application server support practical programs operation;The website keyword query system, by using the station address for having inquiry to search record demand, collecting and storing allows the keyword of searched engine in website;The keyword distribution system, the keyword for monitoring each application server and proxy server asks running situation.

Description

The system and method for record information is searched in a kind of quick search website
Technical field
The invention belongs to internet hunt design field, it is related to reptile and crawls technology, specially a kind of quick search Record information system and method are searched in website.
Background technology
Current Small and Medium Enterprises in China ten thousand family more than 5000, accounts for more than the 99% of national enterprise's sum, medium-sized and small enterprises wound The final products made and the value of service accounted for nearly 6 one-tenth of GDP there is provided cities and towns job accounted for 75%, As the economic sector that China is current and future is most active.With the development of internet, these medium-sized and small enterprises, which are substantially all, to be needed Will be in the online website for setting up the issue products & services of oneself.In traditional Website development flow, " client web site has made Into " mean that the basic datas such as all pages in website, product, article complete, but the traffic-operating period in website later stage, website are searched Whether record situation, website Keyword Selection need all no longer to be concerned in terms of optimization.
In fact, website employee make website be intended to allow search engine to search the record website of oneself, the then world The people of various regions can by searching for some word on a search engine, can faster and better locating websites, so as to help website Obtain more various flow;However, many websites are after completing to make and issuing, site owners a very long time is not known Website searched record situation and oneself selection keyword on a search engine the moon placement search amount, CPC (Cost Per Click, average clicking cost), competition extent, KEI (Key Performance Indicators, KPI Key Performance Indicator) etc. search record Information.Search engine (Search Engine) refers to according to certain strategy, with specific computer program from internet Information is collected, after tissue and processing is carried out to information, retrieval service is provided the user, the related information of user search is shown System or software module to user.Using API application programming interfaces (some pre-defined functions, it is therefore an objective to provide Application program is able to access the ability of one group of routine with developer based on certain software or hardware).
Thus, how to help site owners in time and rapidly to obtain website and search record information, broken through as current techniques Key.
The content of the invention
To solve the above problems, the object of the invention is to provide the system and method that record information is searched in a kind of quick search website, Analyzed by being captured to client web site keyword, obtain all keywords of client web site, then searched for by analog subscriber Mode, the keyword for obtaining client web site searches the record information content in corresponding search engine, finally again by calling search to draw Corresponding api is held up to obtain the monthly average volumes of searches of single keyword, it is CPC (Cost Per Click, average clicking cost), competing Degree of striving, KEI (Key Performance Indicators, KPI Key Performance Indicator) etc.;
The technical solution adopted by the present invention is:
A kind of system that record information is searched in quick search website, the system that query web searches record information is looked into including website keyword Inquiry system, keyword distribution system, keyword search record inquiry system and keyword searches record statistical system, the website keyword Inquiry system is distributed keyword to each server, the pass by keyword distribution system after website keyword is got Keyword searches record inquiry system according to the keyword in server, and request is sent to search engine, and by analyzing the data returned Obtain keyword and search record situation, the keyword, which searches record statistical system and concludes and count the keyword got, searches record data, shape Into keyword statistical report form;
Each described server includes application server and proxy server, and application server support practical programs are run, Proxy server support accesses search engine;
The website keyword query system, by using the station address for having inquiry to search record demand, collects and stores net Searched engine is allowed to search the keyword of the record page in standing;
The keyword distribution system, the keyword for monitoring each application server and proxy server asks operation Situation, its main function includes:(1) unassigned keyword is distributed to the proxy server of free time, and these are closed Keyword is set to " having distributed state ";(2) keyword for not scanning for engine calling in the proxy server for having dropped power is set to " non-search condition ";(3) whether monitoring will be reached the standard of calling search engine by the proxy server of drop power after power is increased, will The proxy server state for reaching standard is set to " upstate ";
The keyword searches record inquiry system, by using " simulation manual request " and the mode of " machine control of right ", Using keyword as search condition, by analyze return search result, obtain website keyword in a search engine search record Situation;
The keyword searches record statistical system, for searching after record has been inquired about to enter this data in client web site keyword Row statistical disposition, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion, generation keyword system are carried out to keyword Index contour figure and keyword statistics list, can carry out the comparison of system to the keyword of client and check;
A kind of method that record information is searched in quick search website, specific steps include:
Step 1:Website keyword is obtained;Performed by website keyword query system;Specially:Obtained by station address Data in the robots files of website and the siteMap files of website, traversal robots files and siteMap files, are extracted Website permission search engine searches the link information of record;
According to link information, the corresponding HTML information of link is obtained, HTML content is analyzed, finds in HTML<meta Name=" keywords " content=" XXX ">Mark, the pass that the corresponding XXX of content link for correspondence wherein in label Keyword content, and with symbol ", |;;、.." as cutting foundation, by carrying out participle fractionation to key words content, and filter Invalid phrase, extracts core phrase;
Further, based on core phrase, neologisms is added before and after core phrase, derivative words are combined into Group;
The core phrase and the derivative phrase are the keyword datas that final search engine searches record inquiry, by this data Store in storage medium, be that keyword distribution system is prepared with record inquiry system is searched;
Especially, if not having robots files, acquiescence is that all pages are all captured;
Step 2:Website keyword distribution;Application server is by proxy server proxy access search engine and asks to divide With keyword, performed by keyword distribution system;Specifically include:
Step 201:The proxy server of poll all " service state is the free time ", judges whether proxy server weights surpass Cross default keyword and search the minimum weights of record, if more than minimum weights, going to step 202;If less than minimum weights, going to step 203;
Step 202:The minimum weights of record are searched if greater than keyword, unassigned keyword is distributed to the service of free time Device, and these keywords are set to the state of distribution;
Step 203:The minimum weights of record are searched if less than keyword, increase weights time server last time are checked, such as The last increase weights time gap current time of fruit, which is more than or equal to default weights, increases the time, then is weighed in current server Increase weights on the basis of value;If add after machine weights more than or equal to minimum critical word search record weights then distribution portion Keyword searches record weights to current machine if add after again below minimum critical word, then the next idle agency's clothes of poll Business device;
Step 3:Keyword searches record inquiry, and searching record inquiry system by keyword performs:Poll has distributed to proxy server Keyword, by way of " simulation manual request ", using the keyword that is polled to as search condition, keyword searches record inquiry System sends request, and the data returned according to search engine to search engine, and assigned operation is performed respectively:
Step 3-1:When search engine denied access is asked, server is refused access times and increases " 1 ", while judging service Device is refused whether access times reach peak value;If reaching peak value, the access of current application server is terminated, will have been distributed to current Do not completed in proxy server and search the keyword of record information inquiry and be set to " unallocated " state, the weights of proxy server=current Weights/2 of proxy server;If being not reaching to peak value, proxy server enters resting state, again to searching after dormancy terminates Rope engine is asked;
When proxy server causes drop because reaching that server is asked peak value by access denial temporary, to avoid search engine from discovering clothes Business device is frequently accessed, and could be carried out again after giving tacit consent to more than at least one hour " keyword searches record inquiry ";Weights each time increase Plus the speed for state of being resumed work according to server is set, using 1 hour as calculation basis, i.e., according to minimum weights kimonos The difference for device current weight of being engaged in, difference was evenly distributed within 1 hour;
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power When value is not 0, new weights=current weight/2, and record between current subtract temporary;At every fixed time, by by the service of drop power Device weights increase some weights, if add after server weights reach can be with calling search engine, then by server state Upstate is set to, increase weights judge when otherwise waiting for being polled to the server next time;
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power Be worth for 0 when, current server is set to " unavailable " state, and inform operation maintenance personnel;
Especially, the keyword for not scanned for engine calling in the server of drop power is set to " not searching element " state;
Step 3-2:When search engine is returned normally, keyword searches the website domain name data set that record inquiry system parsing is returned Close, the website domain name data acquisition system of return is presented in the form of " the html webpage page of multipage ";In first html page, The station address for having inquiry to search record demand is analyzed whether in the website domain name data acquisition system of return, if then will currently return Website domain name data storage in storage medium;The website domain name data set in next html page is called if not Conjunction is analyzed, until search has inquiry to search the station address of record demand or to the last untill a html page;Through searching record The keyword completed is inquired about, the record state of searching of change keyword is " completion ", the server of keyword place proxy server Access times are refused to subtract " 1 ";
For searching the keyword that record state is " completion ", keyword is obtained by the api interface of calling search engine Month placement search amount, CPC (Cost Per Click, average clicking cost), competition extent, KPI (Key Performance Indicators, KPI Key Performance Indicator)
Especially, server rest period is refused access times intelligently change with server, and server is refused access times, taken It is longer that business device enters the period of rest period;
Step 4:Keyword is counted, and searching record statistical system by keyword performs;Specially:In website, keyword searches record inquiry After finishing, statistical disposition is carried out to this data, by the way of MAP-REDUCE (mapping-abbreviation), keyword returned Receive summary, generation keyword statistic curve figure and keyword statistics list can carry out the comparison of system and look into keyword See.
The present invention has following beneficial effect:
(1) present invention obtains all keywords of website by capturing website robots files and siteMap files, comprehensive Using the mode for simulating manual search and control of right, simulation browser request obtains search engine returning result, realizes automatic And rapidly crawl website keyword is searched engine and searches the information of record, in time and website keyword is well understood draws in search Ranking effect in holding up;
(2) present invention for server by configuring weights, in the way of controlling server weights, and control website keyword exists The operation of distribution and server calls search engine in server, formation science and effective keyword are captured and record is searched in retrieval System, helps website webmaster convenient and rapidly record situation is searched in grasp website, and then optimizes website keyword in time;
Brief description of the drawings
Fig. 1 searches the system construction drawing of record information for a kind of quick search website in the embodiment of the present invention;
Fig. 2 searches the method implementation process figure of record information for a kind of quick search website in the embodiment of the present invention;
Fig. 3 is website keyword allocation flow schematic diagram in the embodiment of the present invention
Keyword searches record querying flow schematic diagram in Fig. 4 embodiment of the present invention
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair The present invention is described in detail.
Fig. 1 searches the system construction drawing of record information, including server for a kind of quick search website in the embodiment of the present invention 101st, website keyword query system 102, keyword distribution system 103, keyword search record inquiry system 104 and keyword is searched Statistical system 105 is recorded, the website keyword query system 102 is after website keyword is got, by keyword distribution system 103 distribute keyword to each server 101, the record inquiry system 104 of searching according to the keyword in server 101, to Search engine sends request, and searches record situation, the keyword Sou Lu departments of statistic by analyzing the data acquisition keyword returned System 105, which is concluded and counts the keyword got, searches record data, forms keyword statistical report form;
The server 101 includes application server 101-1 and proxy server 101-2, application server 101-1 supports Practical programs are run, and proxy server 101-2 supports access search engine;
The website keyword query system 102, by using the station address for having inquiry to search record demand, collects and stores Searched engine is allowed to search the keyword of the record page in website;
The keyword distribution system 103, the keyword for monitoring each application server and proxy server is asked Running situation, its main function includes:(1) by unassigned keyword distribute to free time proxy server, and by this A little keywords are set to " having distributed state ";(2) keyword of engine calling will not be scanned in the proxy server for having dropped power It is set to " non-search condition ";(3) whether monitoring will be reached the mark of calling search engine by the proxy server of drop power after power is increased Standard, the proxy server state for the standard that reaches is set to " upstate ";
The keyword searches record inquiry system 104, by using " simulation manual request " and the side of " machine control of right " Formula, using keyword as search condition, by analyzing the search result returned, obtains website keyword searching in a search engine Record situation;
The keyword searches record statistical system 105, for being searched in client web site keyword after record has been inquired about to this number of times According to statistical disposition is carried out, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion is carried out to keyword, generation is crucial Word statistic curve figure and keyword statistics list, can carry out the comparison of system to the keyword of client and check;
Fig. 2 searches the method implementation process figure of record information for a kind of quick search website in the embodiment of the present invention, by visitor The keyword crawl analysis of family website, obtains all keywords of client web site, then by way of analog subscriber is searched for, obtains The keyword of client web site searches the record information content corresponding search engine, finally corresponding by calling search engine again Api obtains the monthly average volumes of searches of single keyword, CPC (Cost Per Click, average clicking cost), competition extent, KEI (Key Performance Indicators, KPI Key Performance Indicator, idiographic flow includes:
Step 201:Website keyword is obtained;Website keyword query system obtains the robots of website by station address Data in file and the siteMap files of website, traversal robots files and siteMap files, extracting website allows search Engine searches the link information of record;If not having robots files, acquiescence is that all pages are all captured;
According to link information, the corresponding HTML information of link is obtained, HTML content is analyzed, finds in HTML<meta Name=" keywords " content=" XXX ">Mark, the pass that the corresponding XXX of content link for correspondence wherein in label Keyword content, with ", |;;、.." as cutting foundation, content contents are split, and it is public to filter similar " Co.Ltd " The invalid words such as title are taken charge of, core phrase is obtained;Further, based on core phrase, limit is added before the word of core phrase and after word Determine word, such as China, Manufacturer form derivative phrase;
All core phrases and the keyword data of derivative phrase composition search engine search, this data Cun Chudao is deposited It is that keyword distribution system is prepared with record inquiry system is searched in storage media;
Step 202:Application server is by proxy server proxy access search engine and asks to distribute keyword, by closing Keyword distribution system is performed, and idiographic flow is website keyword allocation flow schematic diagram in the embodiment of the present invention, bag refering to Fig. 3 Include:
Step 201-1:Confirm application server 101-1 and proxy server 101-2 quantity, and proxy server is put down Distribute to application server;
Step 202-2:Whether the proxy server of poll all " service state is the free time ", judge proxy server weights The minimum weights of record are searched more than default keyword, if more than minimum weights, going to step 202-3;If less than minimum weights, going to step 202-4;Wherein, described " service state is the free time " refers to that the keyword of proxy server is not at searching in record inquiry;
Step 202-3:Record weights are searched if greater than minimum keyword, unassigned keyword is distributed to the generation of free time Server is managed, and these keywords are set to the state of distribution;
Step 202-4:Record weights are searched if less than minimum keyword, increase weights time server last time is checked, If last time increase weights time gap current time, which is more than or equal to default weights, increases the time, in current server Increase weights on the basis of weights;If add after machine weights more than or equal to minimum critical word search record weights then dispenser Divide keyword to current machine, record weights are searched if add after again below minimum critical word, then the next idle agency of poll Server.
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power When value is not 0, new weights=current weight/2, and record between current subtract temporary;At every fixed time, by by the service of drop power Device weights increase some weights, if add after server weights reach can be with calling search engine, then by server state Upstate is set to, increase weights judge when otherwise waiting for being polled to the server next time;
Step 203:Keyword searches record inquiry, and keyword searches the key that record inquiry system poll has distributed to proxy server Word, by way of " simulation manual request ", using the keyword that is polled to as search condition, keyword search record inquiry system to Search engine sends request, and the data returned according to search engine, and assigned operation is performed respectively:
Step 203-1:When search engine denied access is asked, server is refused access times and increases " 1 ", while judging clothes Business device is refused whether access times reach peak value;If reaching peak value, terminate current application server access, will distribute to work as Do not completed in preceding proxy server and search the keyword of record information inquiry and be set to " not searching for " state, the weights of proxy server=when Weights/2 of preceding proxy server;If being not reaching to peak value, proxy server enters resting state, after dormancy terminates again to Search engine sends request;
When proxy server causes drop because reaching that server is asked peak value by access denial temporary, to avoid search engine from discovering clothes Business device is frequently accessed, and could be carried out again after giving tacit consent to more than at least one hour " keyword searches record inquiry ";Weights each time increase Plus the speed for state of being resumed work according to server is set, using 1 hour as calculation basis, i.e., according to minimum weights kimonos The difference for device current weight of being engaged in, difference was evenly distributed within 1 hour;Such as:The weight of default setting is 5000, minimum point It is 4000 with weighted value, when failure, weight becomes 2500, and server could be allocated keyword again after one hour, every The weight of increase in 3 minutes, the weights increase frequency of a hour is 20 (i.e.=60min/3min) then each increased weights For 1500/20=75, then increase by 75 weights every time;
Step 203-2:When search engine is returned normally, keyword searches the website domain name data that record inquiry system parsing is returned Set, the website domain name data acquisition system of return is presented in the form of " the html webpage page of multipage ";In first html page In, the station address for having inquiry to search record demand is analyzed whether in the website domain name data acquisition system of return, if then by currently The website domain name data storage of return is in storage medium;The website domain concrete number in next html page is called if not Analyzed according to set, until search has inquiry to search the station address of record demand or to the last untill a html page;Through The keyword that record inquiry is completed is searched, that changes keyword searches record state for " completion ", the clothes of proxy server where keyword Business device is refused access times and subtracted " 1 ";
For searching the keyword that record state is " completion ", keyword is obtained by the api interface of calling search engine Month placement search amount, CPC (Cost Per Click, average clicking cost), competition extent, KPI (Key Performance Indicators, KPI Key Performance Indicator).
Especially, server rest period is refused access times intelligently change with server, and server is got over by access times are refused Many, the period that server enters rest period is longer;
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power Be worth for 0 when, current server is set to " unavailable " state, and inform operation maintenance personnel;
Especially, the keyword for not scanned for engine calling in the server of drop power is set to " not searching element " state;
Step 204:Keyword count, keyword search record statistical system perform website keyword search record inquiry finish after, Statistical disposition is carried out to this data, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion is carried out to keyword, Keyword statistic curve figure and keyword statistics list are generated, the comparison of system can be carried out to keyword and is checked.
Those of ordinary skills in the art should understand that:The specific embodiment of the present invention is the foregoing is only, and The limitation present invention is not used in, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., It should be included within protection scope of the present invention.

Claims (2)

1. the system that record information is searched in a kind of quick search website, it is characterized in that including website keyword query system, keyword point Match system, keyword search record inquiry system and keyword searches record statistical system, and the website keyword query system is being obtained To after the keyword of website, keyword is distributed to each server by keyword distribution system, the keyword searches record inquiry system System sends request according to the keyword in server, to search engine, and searches record by analyzing the data acquisition keyword returned Situation, the keyword, which searches record statistical system and concludes and count the keyword got, searches record data, forms keyword statistics and reports Table;
Each described server includes application server and proxy server, application server support practical programs operation, agency Server support accesses search engine;
The website keyword query system, by using the station address for having inquiry to search record demand, collects and stores in website Searched engine is allowed to search the keyword of the record page;
The keyword distribution system, operation feelings are asked for monitoring the keyword of each application server and proxy server Condition:(1) unassigned keyword is distributed to the proxy server of free time, and these keywords are set to " have distributed shape State ";(2) keyword for not scanning for engine calling in the proxy server for having dropped power is set to " non-search condition ";(3) supervise The standard by calling search engine whether is reached after power is increased by the proxy server of drop power is controlled, by the agency service for the standard that reaches Device state is set to " upstate ";
The keyword searches record inquiry system, by using " simulation manual request " and the mode of " machine control of right ", to close Keyword is as search condition, by analyzing the search result returned, obtain website keyword in a search engine search record situation;
The keyword searches record statistical system, and this data is united for being searched in client web site keyword after record has been inquired about Meter processing, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion is carried out to keyword, generation keyword statistics is bent Line chart and keyword statistics list, the comparison of system is carried out to keyword and is checked.
2. the method that record information is searched in the quick search website according to claim, it is characterized in that specific steps include:
Step 1:Website keyword is obtained;Performed by website keyword query system;Specially:Website is obtained by station address Robots files and website siteMap files, traversal robots files and siteMap files in data, extract website Search engine is allowed to search the link information of record;
According to link information, the corresponding HTML information of link is obtained, HTML content is analyzed, finds in HTML<meta name =" keywords " content=" XXX ">Mark, the keyword that the corresponding XXX of content link for correspondence wherein in label Content, and with symbol ", |;;、.." as cutting foundation, by carrying out participle fractionation to key words content, and filter invalid Phrase, extracts core phrase;
Further, based on core phrase, neologisms are added before and after core phrase, derivative phrase is combined into;
The core phrase and the derivative phrase are the keyword datas that final search engine searches record inquiry, by this data storage It is that keyword distribution system is prepared with record inquiry system is searched into storage medium;
If not having robots files, acquiescence is that all pages are all captured;
Step 2:Website keyword distribution;Application server is by proxy server proxy access search engine and asks distribution to be closed Keyword, is performed by keyword distribution system;Specifically include:
Step 201:The proxy server of poll all " service state is the free time ", judges whether proxy server weights exceed pre- If keyword search the minimum weights of record, if more than minimum weights, going to step 202;If less than minimum weights, going to step 203;
Step 202:The minimum weights of record are searched if greater than keyword, unassigned keyword is distributed to the server of free time, And these keywords are set to the state of distribution;
Step 203:The minimum weights of record are searched if less than keyword, increase weights time server last time is checked, if on Once increase weights time gap current time, which is more than or equal to default weights, increases the time, then in current server weights On the basis of increase weights;If add after machine weights more than or equal to minimum critical word search record weights then distribution portion be crucial Word searches record weights to current machine if add after again below minimum critical word, then the next idle proxy server of poll;
Step 3:Keyword searches record inquiry, and searching record inquiry system by keyword performs:Poll has distributed to the pass of proxy server Keyword, by way of " simulation manual request ", using the keyword that is polled to as search condition, keyword searches record inquiry system Request, and the data returned according to search engine are sent to search engine, assigned operation is performed respectively:
Step 3-1:When search engine denied access is asked, server is refused access times and increases " 1 ", while judging server quilt Refuse whether access times reach peak value;If reaching peak value, the access of current application server is terminated, will have been distributed to current agent The unfinished keyword for searching record information inquiry is set to " unallocated " state, weights=current agent of proxy server in server Weights/2 of server;If being not reaching to peak value, proxy server enters resting state, draws again to search after dormancy terminates Hold up and send request;
When proxy server causes drop because reaching that server is asked peak value by access denial temporary, to avoid search engine from discovering server Frequently access, could be carried out again after giving tacit consent to more than at least one hour " keyword searches record inquiry ";Weights increase root each time Set according to the resume work speed of state of server, using 1 hour as calculation basis, i.e., according to minimum weights and service device The difference of current weight, difference was evenly distributed within 1 hour;
When proxy server is searched engine refusal and is refused number of times and reaches specified peak value, when server weights are not 0, newly Weights=current weight/2, and record between current subtract temporary;At every fixed time, by by the server weights increase by one of drop power A little weights, if add after server weights reach and server state then can be set to by upstate with calling search engine, Increase weights judge when otherwise waiting for being polled to the server next time;
Especially, it is 0 in server weights when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value When, current server is set to " unavailable " state, and inform operation maintenance personnel;
" not searching element " state is set to by the keyword for not scanning for engine calling in the server of drop power;
Step 3-2:When search engine is returned normally, keyword searches the website domain name data acquisition system that record inquiry system parsing is returned, The website domain name data acquisition system of return is presented in the form of " the html webpage page of multipage ";In first html page, point Analysis has inquiry to search the station address of record demand whether in the website domain name data acquisition system of return, if then will currently return Website domain name data storage is in storage medium;The website domain name data acquisition system in next html page is called if not Analyzed, until search has inquiry to search the station address of record demand or to the last untill a html page;Looked into through searching record The keyword completed is ask, the record state of searching of change keyword is " completion ", the server quilt of keyword place proxy server Access times are refused to subtract " 1 ";
For searching the keyword that record state is " completion ", the moon for obtaining keyword by the api interface of calling search engine is arranged Name volumes of searches, CPC average clicking cost, competition extent, KPI KPI Key Performance Indicators;
Server rest period is refused access times intelligently change with server, and server is refused access times, and server, which enters, stops The period of dormancy phase is longer;
Step 4:Keyword is counted, and searching record statistical system by keyword performs;Specially:In website, keyword is searched record inquiry and finished Afterwards, statistical disposition is carried out to this data, by the way of MAP-REDUCE, induction and conclusion is carried out to keyword, generation is crucial Word statistic curve figure and keyword statistics list, the comparison of system is carried out to keyword and is checked.
CN201710633236.2A 2017-07-28 2017-07-28 The system and method for record information is searched in a kind of quick search website Active CN107291956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710633236.2A CN107291956B (en) 2017-07-28 2017-07-28 The system and method for record information is searched in a kind of quick search website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710633236.2A CN107291956B (en) 2017-07-28 2017-07-28 The system and method for record information is searched in a kind of quick search website

Publications (2)

Publication Number Publication Date
CN107291956A true CN107291956A (en) 2017-10-24
CN107291956B CN107291956B (en) 2018-06-22

Family

ID=60102580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710633236.2A Active CN107291956B (en) 2017-07-28 2017-07-28 The system and method for record information is searched in a kind of quick search website

Country Status (1)

Country Link
CN (1) CN107291956B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609203A (en) * 2017-11-07 2018-01-19 安徽斯百德信息技术有限公司 A kind of data analysis system and method for search engine optimization effect quantitative evaluation
CN110348940A (en) * 2019-05-28 2019-10-18 成都美美臣科技有限公司 A kind of method that e-commerce website search is suggested
CN111611508A (en) * 2020-05-28 2020-09-01 江苏易安联网络技术有限公司 Identification method and device for actual website access of user
CN113010636A (en) * 2021-02-23 2021-06-22 玉米社(深圳)网络科技有限公司 Method for rapidly detecting ranking of all keywords of website

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060053092A1 (en) * 2004-09-01 2006-03-09 Chris Foo Method and system to perform dynamic search over a network
CN103902725A (en) * 2014-04-10 2014-07-02 百度在线网络技术(北京)有限公司 Method and device for acquiring search engine optimization information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060053092A1 (en) * 2004-09-01 2006-03-09 Chris Foo Method and system to perform dynamic search over a network
CN103902725A (en) * 2014-04-10 2014-07-02 百度在线网络技术(北京)有限公司 Method and device for acquiring search engine optimization information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OPOKU-MENSAH EUGENE 等: "Towards ranking cultural terms from originating source", 《IEEE》 *
张雄伟: "汽车配件网站 SEO", 《信息与电脑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609203A (en) * 2017-11-07 2018-01-19 安徽斯百德信息技术有限公司 A kind of data analysis system and method for search engine optimization effect quantitative evaluation
CN110348940A (en) * 2019-05-28 2019-10-18 成都美美臣科技有限公司 A kind of method that e-commerce website search is suggested
CN111611508A (en) * 2020-05-28 2020-09-01 江苏易安联网络技术有限公司 Identification method and device for actual website access of user
CN111611508B (en) * 2020-05-28 2020-12-15 江苏易安联网络技术有限公司 Identification method and device for actual website access of user
CN113010636A (en) * 2021-02-23 2021-06-22 玉米社(深圳)网络科技有限公司 Method for rapidly detecting ranking of all keywords of website

Also Published As

Publication number Publication date
CN107291956B (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN107291956B (en) The system and method for record information is searched in a kind of quick search website
JP6582085B2 (en) Method and apparatus for generating web page content
CN102663048B (en) Method and device for providing search result
Bachlechner et al. Web service discovery-a reality check
CN102932206B (en) The method and system of monitoring website access information
CN104216921B (en) A kind of addition reminding method, apparatus and system for realizing quick links in browser
CN102663617A (en) Method and system for prediction of advertisement clicking rate
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
CN102043783A (en) Data updating method, device and system
CN105512153A (en) Method and device for service provision of online customer service system, and system
JP2006511884A5 (en)
CN101546308B (en) Web page search method and web page search system based on overdue retrieval
CN107463641A (en) System and method for improving the access to search result
CN101729288B (en) Method and device for counting network access behaviours of internet users
CN106295382B (en) A kind of Information Risk preventing control method and device
CN106484709A (en) A kind of auditing method of daily record data and audit device
CN104933069A (en) Method and system for analyzing web browsing statistics of desktop terminal
CN107193831A (en) Information recommendation method and device
CN103412940B (en) The method of detection swindle phone
CN107592305A (en) A kind of anti-brush method and system based on elk and redis
CN107835132B (en) Method and device for tracking flow source
CN106656929A (en) Information processing method and apparatus
CN104902498A (en) Identification method and device for subscriber re-networking
CN103077196B (en) A kind of access method from public network WEB website to intranet data storehouse
CN201114128Y (en) Enterprise search engine device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant