CN107291956A - The system and method for record information is searched in a kind of quick search website - Google Patents
The system and method for record information is searched in a kind of quick search website Download PDFInfo
- Publication number
- CN107291956A CN107291956A CN201710633236.2A CN201710633236A CN107291956A CN 107291956 A CN107291956 A CN 107291956A CN 201710633236 A CN201710633236 A CN 201710633236A CN 107291956 A CN107291956 A CN 107291956A
- Authority
- CN
- China
- Prior art keywords
- keyword
- server
- record
- website
- weights
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The system that record information is searched in a kind of quick search website, record inquiry system is searched including website keyword query system, keyword distribution system, keyword and keyword searches record statistical system, the website keyword query system is after website keyword is got, keyword is distributed to each server by keyword distribution system, the keyword searches record inquiry system according to the keyword in server, and request is sent to search engine;The keyword, which searches record statistical system and concludes and count the keyword got, searches record data, formation keyword statistical report form;Each described server includes application server and proxy server, application server support practical programs operation;The website keyword query system, by using the station address for having inquiry to search record demand, collecting and storing allows the keyword of searched engine in website;The keyword distribution system, the keyword for monitoring each application server and proxy server asks running situation.
Description
Technical field
The invention belongs to internet hunt design field, it is related to reptile and crawls technology, specially a kind of quick search
Record information system and method are searched in website.
Background technology
Current Small and Medium Enterprises in China ten thousand family more than 5000, accounts for more than the 99% of national enterprise's sum, medium-sized and small enterprises wound
The final products made and the value of service accounted for nearly 6 one-tenth of GDP there is provided cities and towns job accounted for 75%,
As the economic sector that China is current and future is most active.With the development of internet, these medium-sized and small enterprises, which are substantially all, to be needed
Will be in the online website for setting up the issue products & services of oneself.In traditional Website development flow, " client web site has made
Into " mean that the basic datas such as all pages in website, product, article complete, but the traffic-operating period in website later stage, website are searched
Whether record situation, website Keyword Selection need all no longer to be concerned in terms of optimization.
In fact, website employee make website be intended to allow search engine to search the record website of oneself, the then world
The people of various regions can by searching for some word on a search engine, can faster and better locating websites, so as to help website
Obtain more various flow;However, many websites are after completing to make and issuing, site owners a very long time is not known
Website searched record situation and oneself selection keyword on a search engine the moon placement search amount, CPC (Cost Per
Click, average clicking cost), competition extent, KEI (Key Performance Indicators, KPI Key Performance Indicator) etc. search record
Information.Search engine (Search Engine) refers to according to certain strategy, with specific computer program from internet
Information is collected, after tissue and processing is carried out to information, retrieval service is provided the user, the related information of user search is shown
System or software module to user.Using API application programming interfaces (some pre-defined functions, it is therefore an objective to provide
Application program is able to access the ability of one group of routine with developer based on certain software or hardware).
Thus, how to help site owners in time and rapidly to obtain website and search record information, broken through as current techniques
Key.
The content of the invention
To solve the above problems, the object of the invention is to provide the system and method that record information is searched in a kind of quick search website,
Analyzed by being captured to client web site keyword, obtain all keywords of client web site, then searched for by analog subscriber
Mode, the keyword for obtaining client web site searches the record information content in corresponding search engine, finally again by calling search to draw
Corresponding api is held up to obtain the monthly average volumes of searches of single keyword, it is CPC (Cost Per Click, average clicking cost), competing
Degree of striving, KEI (Key Performance Indicators, KPI Key Performance Indicator) etc.;
The technical solution adopted by the present invention is:
A kind of system that record information is searched in quick search website, the system that query web searches record information is looked into including website keyword
Inquiry system, keyword distribution system, keyword search record inquiry system and keyword searches record statistical system, the website keyword
Inquiry system is distributed keyword to each server, the pass by keyword distribution system after website keyword is got
Keyword searches record inquiry system according to the keyword in server, and request is sent to search engine, and by analyzing the data returned
Obtain keyword and search record situation, the keyword, which searches record statistical system and concludes and count the keyword got, searches record data, shape
Into keyword statistical report form;
Each described server includes application server and proxy server, and application server support practical programs are run,
Proxy server support accesses search engine;
The website keyword query system, by using the station address for having inquiry to search record demand, collects and stores net
Searched engine is allowed to search the keyword of the record page in standing;
The keyword distribution system, the keyword for monitoring each application server and proxy server asks operation
Situation, its main function includes:(1) unassigned keyword is distributed to the proxy server of free time, and these are closed
Keyword is set to " having distributed state ";(2) keyword for not scanning for engine calling in the proxy server for having dropped power is set to
" non-search condition ";(3) whether monitoring will be reached the standard of calling search engine by the proxy server of drop power after power is increased, will
The proxy server state for reaching standard is set to " upstate ";
The keyword searches record inquiry system, by using " simulation manual request " and the mode of " machine control of right ",
Using keyword as search condition, by analyze return search result, obtain website keyword in a search engine search record
Situation;
The keyword searches record statistical system, for searching after record has been inquired about to enter this data in client web site keyword
Row statistical disposition, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion, generation keyword system are carried out to keyword
Index contour figure and keyword statistics list, can carry out the comparison of system to the keyword of client and check;
A kind of method that record information is searched in quick search website, specific steps include:
Step 1:Website keyword is obtained;Performed by website keyword query system;Specially:Obtained by station address
Data in the robots files of website and the siteMap files of website, traversal robots files and siteMap files, are extracted
Website permission search engine searches the link information of record;
According to link information, the corresponding HTML information of link is obtained, HTML content is analyzed, finds in HTML<meta
Name=" keywords " content=" XXX ">Mark, the pass that the corresponding XXX of content link for correspondence wherein in label
Keyword content, and with symbol ", |;;、.." as cutting foundation, by carrying out participle fractionation to key words content, and filter
Invalid phrase, extracts core phrase;
Further, based on core phrase, neologisms is added before and after core phrase, derivative words are combined into
Group;
The core phrase and the derivative phrase are the keyword datas that final search engine searches record inquiry, by this data
Store in storage medium, be that keyword distribution system is prepared with record inquiry system is searched;
Especially, if not having robots files, acquiescence is that all pages are all captured;
Step 2:Website keyword distribution;Application server is by proxy server proxy access search engine and asks to divide
With keyword, performed by keyword distribution system;Specifically include:
Step 201:The proxy server of poll all " service state is the free time ", judges whether proxy server weights surpass
Cross default keyword and search the minimum weights of record, if more than minimum weights, going to step 202;If less than minimum weights, going to step 203;
Step 202:The minimum weights of record are searched if greater than keyword, unassigned keyword is distributed to the service of free time
Device, and these keywords are set to the state of distribution;
Step 203:The minimum weights of record are searched if less than keyword, increase weights time server last time are checked, such as
The last increase weights time gap current time of fruit, which is more than or equal to default weights, increases the time, then is weighed in current server
Increase weights on the basis of value;If add after machine weights more than or equal to minimum critical word search record weights then distribution portion
Keyword searches record weights to current machine if add after again below minimum critical word, then the next idle agency's clothes of poll
Business device;
Step 3:Keyword searches record inquiry, and searching record inquiry system by keyword performs:Poll has distributed to proxy server
Keyword, by way of " simulation manual request ", using the keyword that is polled to as search condition, keyword searches record inquiry
System sends request, and the data returned according to search engine to search engine, and assigned operation is performed respectively:
Step 3-1:When search engine denied access is asked, server is refused access times and increases " 1 ", while judging service
Device is refused whether access times reach peak value;If reaching peak value, the access of current application server is terminated, will have been distributed to current
Do not completed in proxy server and search the keyword of record information inquiry and be set to " unallocated " state, the weights of proxy server=current
Weights/2 of proxy server;If being not reaching to peak value, proxy server enters resting state, again to searching after dormancy terminates
Rope engine is asked;
When proxy server causes drop because reaching that server is asked peak value by access denial temporary, to avoid search engine from discovering clothes
Business device is frequently accessed, and could be carried out again after giving tacit consent to more than at least one hour " keyword searches record inquiry ";Weights each time increase
Plus the speed for state of being resumed work according to server is set, using 1 hour as calculation basis, i.e., according to minimum weights kimonos
The difference for device current weight of being engaged in, difference was evenly distributed within 1 hour;
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power
When value is not 0, new weights=current weight/2, and record between current subtract temporary;At every fixed time, by by the service of drop power
Device weights increase some weights, if add after server weights reach can be with calling search engine, then by server state
Upstate is set to, increase weights judge when otherwise waiting for being polled to the server next time;
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power
Be worth for 0 when, current server is set to " unavailable " state, and inform operation maintenance personnel;
Especially, the keyword for not scanned for engine calling in the server of drop power is set to " not searching element " state;
Step 3-2:When search engine is returned normally, keyword searches the website domain name data set that record inquiry system parsing is returned
Close, the website domain name data acquisition system of return is presented in the form of " the html webpage page of multipage ";In first html page,
The station address for having inquiry to search record demand is analyzed whether in the website domain name data acquisition system of return, if then will currently return
Website domain name data storage in storage medium;The website domain name data set in next html page is called if not
Conjunction is analyzed, until search has inquiry to search the station address of record demand or to the last untill a html page;Through searching record
The keyword completed is inquired about, the record state of searching of change keyword is " completion ", the server of keyword place proxy server
Access times are refused to subtract " 1 ";
For searching the keyword that record state is " completion ", keyword is obtained by the api interface of calling search engine
Month placement search amount, CPC (Cost Per Click, average clicking cost), competition extent, KPI (Key Performance
Indicators, KPI Key Performance Indicator)
Especially, server rest period is refused access times intelligently change with server, and server is refused access times, taken
It is longer that business device enters the period of rest period;
Step 4:Keyword is counted, and searching record statistical system by keyword performs;Specially:In website, keyword searches record inquiry
After finishing, statistical disposition is carried out to this data, by the way of MAP-REDUCE (mapping-abbreviation), keyword returned
Receive summary, generation keyword statistic curve figure and keyword statistics list can carry out the comparison of system and look into keyword
See.
The present invention has following beneficial effect:
(1) present invention obtains all keywords of website by capturing website robots files and siteMap files, comprehensive
Using the mode for simulating manual search and control of right, simulation browser request obtains search engine returning result, realizes automatic
And rapidly crawl website keyword is searched engine and searches the information of record, in time and website keyword is well understood draws in search
Ranking effect in holding up;
(2) present invention for server by configuring weights, in the way of controlling server weights, and control website keyword exists
The operation of distribution and server calls search engine in server, formation science and effective keyword are captured and record is searched in retrieval
System, helps website webmaster convenient and rapidly record situation is searched in grasp website, and then optimizes website keyword in time;
Brief description of the drawings
Fig. 1 searches the system construction drawing of record information for a kind of quick search website in the embodiment of the present invention;
Fig. 2 searches the method implementation process figure of record information for a kind of quick search website in the embodiment of the present invention;
Fig. 3 is website keyword allocation flow schematic diagram in the embodiment of the present invention
Keyword searches record querying flow schematic diagram in Fig. 4 embodiment of the present invention
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair
The present invention is described in detail.
Fig. 1 searches the system construction drawing of record information, including server for a kind of quick search website in the embodiment of the present invention
101st, website keyword query system 102, keyword distribution system 103, keyword search record inquiry system 104 and keyword is searched
Statistical system 105 is recorded, the website keyword query system 102 is after website keyword is got, by keyword distribution system
103 distribute keyword to each server 101, the record inquiry system 104 of searching according to the keyword in server 101, to
Search engine sends request, and searches record situation, the keyword Sou Lu departments of statistic by analyzing the data acquisition keyword returned
System 105, which is concluded and counts the keyword got, searches record data, forms keyword statistical report form;
The server 101 includes application server 101-1 and proxy server 101-2, application server 101-1 supports
Practical programs are run, and proxy server 101-2 supports access search engine;
The website keyword query system 102, by using the station address for having inquiry to search record demand, collects and stores
Searched engine is allowed to search the keyword of the record page in website;
The keyword distribution system 103, the keyword for monitoring each application server and proxy server is asked
Running situation, its main function includes:(1) by unassigned keyword distribute to free time proxy server, and by this
A little keywords are set to " having distributed state ";(2) keyword of engine calling will not be scanned in the proxy server for having dropped power
It is set to " non-search condition ";(3) whether monitoring will be reached the mark of calling search engine by the proxy server of drop power after power is increased
Standard, the proxy server state for the standard that reaches is set to " upstate ";
The keyword searches record inquiry system 104, by using " simulation manual request " and the side of " machine control of right "
Formula, using keyword as search condition, by analyzing the search result returned, obtains website keyword searching in a search engine
Record situation;
The keyword searches record statistical system 105, for being searched in client web site keyword after record has been inquired about to this number of times
According to statistical disposition is carried out, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion is carried out to keyword, generation is crucial
Word statistic curve figure and keyword statistics list, can carry out the comparison of system to the keyword of client and check;
Fig. 2 searches the method implementation process figure of record information for a kind of quick search website in the embodiment of the present invention, by visitor
The keyword crawl analysis of family website, obtains all keywords of client web site, then by way of analog subscriber is searched for, obtains
The keyword of client web site searches the record information content corresponding search engine, finally corresponding by calling search engine again
Api obtains the monthly average volumes of searches of single keyword, CPC (Cost Per Click, average clicking cost), competition extent, KEI
(Key Performance Indicators, KPI Key Performance Indicator, idiographic flow includes:
Step 201:Website keyword is obtained;Website keyword query system obtains the robots of website by station address
Data in file and the siteMap files of website, traversal robots files and siteMap files, extracting website allows search
Engine searches the link information of record;If not having robots files, acquiescence is that all pages are all captured;
According to link information, the corresponding HTML information of link is obtained, HTML content is analyzed, finds in HTML<meta
Name=" keywords " content=" XXX ">Mark, the pass that the corresponding XXX of content link for correspondence wherein in label
Keyword content, with ", |;;、.." as cutting foundation, content contents are split, and it is public to filter similar " Co.Ltd "
The invalid words such as title are taken charge of, core phrase is obtained;Further, based on core phrase, limit is added before the word of core phrase and after word
Determine word, such as China, Manufacturer form derivative phrase;
All core phrases and the keyword data of derivative phrase composition search engine search, this data Cun Chudao is deposited
It is that keyword distribution system is prepared with record inquiry system is searched in storage media;
Step 202:Application server is by proxy server proxy access search engine and asks to distribute keyword, by closing
Keyword distribution system is performed, and idiographic flow is website keyword allocation flow schematic diagram in the embodiment of the present invention, bag refering to Fig. 3
Include:
Step 201-1:Confirm application server 101-1 and proxy server 101-2 quantity, and proxy server is put down
Distribute to application server;
Step 202-2:Whether the proxy server of poll all " service state is the free time ", judge proxy server weights
The minimum weights of record are searched more than default keyword, if more than minimum weights, going to step 202-3;If less than minimum weights, going to step
202-4;Wherein, described " service state is the free time " refers to that the keyword of proxy server is not at searching in record inquiry;
Step 202-3:Record weights are searched if greater than minimum keyword, unassigned keyword is distributed to the generation of free time
Server is managed, and these keywords are set to the state of distribution;
Step 202-4:Record weights are searched if less than minimum keyword, increase weights time server last time is checked,
If last time increase weights time gap current time, which is more than or equal to default weights, increases the time, in current server
Increase weights on the basis of weights;If add after machine weights more than or equal to minimum critical word search record weights then dispenser
Divide keyword to current machine, record weights are searched if add after again below minimum critical word, then the next idle agency of poll
Server.
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power
When value is not 0, new weights=current weight/2, and record between current subtract temporary;At every fixed time, by by the service of drop power
Device weights increase some weights, if add after server weights reach can be with calling search engine, then by server state
Upstate is set to, increase weights judge when otherwise waiting for being polled to the server next time;
Step 203:Keyword searches record inquiry, and keyword searches the key that record inquiry system poll has distributed to proxy server
Word, by way of " simulation manual request ", using the keyword that is polled to as search condition, keyword search record inquiry system to
Search engine sends request, and the data returned according to search engine, and assigned operation is performed respectively:
Step 203-1:When search engine denied access is asked, server is refused access times and increases " 1 ", while judging clothes
Business device is refused whether access times reach peak value;If reaching peak value, terminate current application server access, will distribute to work as
Do not completed in preceding proxy server and search the keyword of record information inquiry and be set to " not searching for " state, the weights of proxy server=when
Weights/2 of preceding proxy server;If being not reaching to peak value, proxy server enters resting state, after dormancy terminates again to
Search engine sends request;
When proxy server causes drop because reaching that server is asked peak value by access denial temporary, to avoid search engine from discovering clothes
Business device is frequently accessed, and could be carried out again after giving tacit consent to more than at least one hour " keyword searches record inquiry ";Weights each time increase
Plus the speed for state of being resumed work according to server is set, using 1 hour as calculation basis, i.e., according to minimum weights kimonos
The difference for device current weight of being engaged in, difference was evenly distributed within 1 hour;Such as:The weight of default setting is 5000, minimum point
It is 4000 with weighted value, when failure, weight becomes 2500, and server could be allocated keyword again after one hour, every
The weight of increase in 3 minutes, the weights increase frequency of a hour is 20 (i.e.=60min/3min) then each increased weights
For 1500/20=75, then increase by 75 weights every time;
Step 203-2:When search engine is returned normally, keyword searches the website domain name data that record inquiry system parsing is returned
Set, the website domain name data acquisition system of return is presented in the form of " the html webpage page of multipage ";In first html page
In, the station address for having inquiry to search record demand is analyzed whether in the website domain name data acquisition system of return, if then by currently
The website domain name data storage of return is in storage medium;The website domain concrete number in next html page is called if not
Analyzed according to set, until search has inquiry to search the station address of record demand or to the last untill a html page;Through
The keyword that record inquiry is completed is searched, that changes keyword searches record state for " completion ", the clothes of proxy server where keyword
Business device is refused access times and subtracted " 1 ";
For searching the keyword that record state is " completion ", keyword is obtained by the api interface of calling search engine
Month placement search amount, CPC (Cost Per Click, average clicking cost), competition extent, KPI (Key Performance
Indicators, KPI Key Performance Indicator).
Especially, server rest period is refused access times intelligently change with server, and server is got over by access times are refused
Many, the period that server enters rest period is longer;
Especially, when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value, in server power
Be worth for 0 when, current server is set to " unavailable " state, and inform operation maintenance personnel;
Especially, the keyword for not scanned for engine calling in the server of drop power is set to " not searching element " state;
Step 204:Keyword count, keyword search record statistical system perform website keyword search record inquiry finish after,
Statistical disposition is carried out to this data, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion is carried out to keyword,
Keyword statistic curve figure and keyword statistics list are generated, the comparison of system can be carried out to keyword and is checked.
Those of ordinary skills in the art should understand that:The specific embodiment of the present invention is the foregoing is only, and
The limitation present invention is not used in, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc.,
It should be included within protection scope of the present invention.
Claims (2)
1. the system that record information is searched in a kind of quick search website, it is characterized in that including website keyword query system, keyword point
Match system, keyword search record inquiry system and keyword searches record statistical system, and the website keyword query system is being obtained
To after the keyword of website, keyword is distributed to each server by keyword distribution system, the keyword searches record inquiry system
System sends request according to the keyword in server, to search engine, and searches record by analyzing the data acquisition keyword returned
Situation, the keyword, which searches record statistical system and concludes and count the keyword got, searches record data, forms keyword statistics and reports
Table;
Each described server includes application server and proxy server, application server support practical programs operation, agency
Server support accesses search engine;
The website keyword query system, by using the station address for having inquiry to search record demand, collects and stores in website
Searched engine is allowed to search the keyword of the record page;
The keyword distribution system, operation feelings are asked for monitoring the keyword of each application server and proxy server
Condition:(1) unassigned keyword is distributed to the proxy server of free time, and these keywords are set to " have distributed shape
State ";(2) keyword for not scanning for engine calling in the proxy server for having dropped power is set to " non-search condition ";(3) supervise
The standard by calling search engine whether is reached after power is increased by the proxy server of drop power is controlled, by the agency service for the standard that reaches
Device state is set to " upstate ";
The keyword searches record inquiry system, by using " simulation manual request " and the mode of " machine control of right ", to close
Keyword is as search condition, by analyzing the search result returned, obtain website keyword in a search engine search record situation;
The keyword searches record statistical system, and this data is united for being searched in client web site keyword after record has been inquired about
Meter processing, by the way of MAP-REDUCE (mapping-abbreviation), induction and conclusion is carried out to keyword, generation keyword statistics is bent
Line chart and keyword statistics list, the comparison of system is carried out to keyword and is checked.
2. the method that record information is searched in the quick search website according to claim, it is characterized in that specific steps include:
Step 1:Website keyword is obtained;Performed by website keyword query system;Specially:Website is obtained by station address
Robots files and website siteMap files, traversal robots files and siteMap files in data, extract website
Search engine is allowed to search the link information of record;
According to link information, the corresponding HTML information of link is obtained, HTML content is analyzed, finds in HTML<meta name
=" keywords " content=" XXX ">Mark, the keyword that the corresponding XXX of content link for correspondence wherein in label
Content, and with symbol ", |;;、.." as cutting foundation, by carrying out participle fractionation to key words content, and filter invalid
Phrase, extracts core phrase;
Further, based on core phrase, neologisms are added before and after core phrase, derivative phrase is combined into;
The core phrase and the derivative phrase are the keyword datas that final search engine searches record inquiry, by this data storage
It is that keyword distribution system is prepared with record inquiry system is searched into storage medium;
If not having robots files, acquiescence is that all pages are all captured;
Step 2:Website keyword distribution;Application server is by proxy server proxy access search engine and asks distribution to be closed
Keyword, is performed by keyword distribution system;Specifically include:
Step 201:The proxy server of poll all " service state is the free time ", judges whether proxy server weights exceed pre-
If keyword search the minimum weights of record, if more than minimum weights, going to step 202;If less than minimum weights, going to step 203;
Step 202:The minimum weights of record are searched if greater than keyword, unassigned keyword is distributed to the server of free time,
And these keywords are set to the state of distribution;
Step 203:The minimum weights of record are searched if less than keyword, increase weights time server last time is checked, if on
Once increase weights time gap current time, which is more than or equal to default weights, increases the time, then in current server weights
On the basis of increase weights;If add after machine weights more than or equal to minimum critical word search record weights then distribution portion be crucial
Word searches record weights to current machine if add after again below minimum critical word, then the next idle proxy server of poll;
Step 3:Keyword searches record inquiry, and searching record inquiry system by keyword performs:Poll has distributed to the pass of proxy server
Keyword, by way of " simulation manual request ", using the keyword that is polled to as search condition, keyword searches record inquiry system
Request, and the data returned according to search engine are sent to search engine, assigned operation is performed respectively:
Step 3-1:When search engine denied access is asked, server is refused access times and increases " 1 ", while judging server quilt
Refuse whether access times reach peak value;If reaching peak value, the access of current application server is terminated, will have been distributed to current agent
The unfinished keyword for searching record information inquiry is set to " unallocated " state, weights=current agent of proxy server in server
Weights/2 of server;If being not reaching to peak value, proxy server enters resting state, draws again to search after dormancy terminates
Hold up and send request;
When proxy server causes drop because reaching that server is asked peak value by access denial temporary, to avoid search engine from discovering server
Frequently access, could be carried out again after giving tacit consent to more than at least one hour " keyword searches record inquiry ";Weights increase root each time
Set according to the resume work speed of state of server, using 1 hour as calculation basis, i.e., according to minimum weights and service device
The difference of current weight, difference was evenly distributed within 1 hour;
When proxy server is searched engine refusal and is refused number of times and reaches specified peak value, when server weights are not 0, newly
Weights=current weight/2, and record between current subtract temporary;At every fixed time, by by the server weights increase by one of drop power
A little weights, if add after server weights reach and server state then can be set to by upstate with calling search engine,
Increase weights judge when otherwise waiting for being polled to the server next time;
Especially, it is 0 in server weights when proxy server, which is searched engine refusal, and is refused number of times reaches specified peak value
When, current server is set to " unavailable " state, and inform operation maintenance personnel;
" not searching element " state is set to by the keyword for not scanning for engine calling in the server of drop power;
Step 3-2:When search engine is returned normally, keyword searches the website domain name data acquisition system that record inquiry system parsing is returned,
The website domain name data acquisition system of return is presented in the form of " the html webpage page of multipage ";In first html page, point
Analysis has inquiry to search the station address of record demand whether in the website domain name data acquisition system of return, if then will currently return
Website domain name data storage is in storage medium;The website domain name data acquisition system in next html page is called if not
Analyzed, until search has inquiry to search the station address of record demand or to the last untill a html page;Looked into through searching record
The keyword completed is ask, the record state of searching of change keyword is " completion ", the server quilt of keyword place proxy server
Access times are refused to subtract " 1 ";
For searching the keyword that record state is " completion ", the moon for obtaining keyword by the api interface of calling search engine is arranged
Name volumes of searches, CPC average clicking cost, competition extent, KPI KPI Key Performance Indicators;
Server rest period is refused access times intelligently change with server, and server is refused access times, and server, which enters, stops
The period of dormancy phase is longer;
Step 4:Keyword is counted, and searching record statistical system by keyword performs;Specially:In website, keyword is searched record inquiry and finished
Afterwards, statistical disposition is carried out to this data, by the way of MAP-REDUCE, induction and conclusion is carried out to keyword, generation is crucial
Word statistic curve figure and keyword statistics list, the comparison of system is carried out to keyword and is checked.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710633236.2A CN107291956B (en) | 2017-07-28 | 2017-07-28 | The system and method for record information is searched in a kind of quick search website |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710633236.2A CN107291956B (en) | 2017-07-28 | 2017-07-28 | The system and method for record information is searched in a kind of quick search website |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107291956A true CN107291956A (en) | 2017-10-24 |
CN107291956B CN107291956B (en) | 2018-06-22 |
Family
ID=60102580
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710633236.2A Active CN107291956B (en) | 2017-07-28 | 2017-07-28 | The system and method for record information is searched in a kind of quick search website |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291956B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609203A (en) * | 2017-11-07 | 2018-01-19 | 安徽斯百德信息技术有限公司 | A kind of data analysis system and method for search engine optimization effect quantitative evaluation |
CN110348940A (en) * | 2019-05-28 | 2019-10-18 | 成都美美臣科技有限公司 | A kind of method that e-commerce website search is suggested |
CN111611508A (en) * | 2020-05-28 | 2020-09-01 | 江苏易安联网络技术有限公司 | Identification method and device for actual website access of user |
CN113010636A (en) * | 2021-02-23 | 2021-06-22 | 玉米社(深圳)网络科技有限公司 | Method for rapidly detecting ranking of all keywords of website |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060053092A1 (en) * | 2004-09-01 | 2006-03-09 | Chris Foo | Method and system to perform dynamic search over a network |
CN103902725A (en) * | 2014-04-10 | 2014-07-02 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring search engine optimization information |
-
2017
- 2017-07-28 CN CN201710633236.2A patent/CN107291956B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060053092A1 (en) * | 2004-09-01 | 2006-03-09 | Chris Foo | Method and system to perform dynamic search over a network |
CN103902725A (en) * | 2014-04-10 | 2014-07-02 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring search engine optimization information |
Non-Patent Citations (2)
Title |
---|
OPOKU-MENSAH EUGENE 等: "Towards ranking cultural terms from originating source", 《IEEE》 * |
张雄伟: "汽车配件网站 SEO", 《信息与电脑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609203A (en) * | 2017-11-07 | 2018-01-19 | 安徽斯百德信息技术有限公司 | A kind of data analysis system and method for search engine optimization effect quantitative evaluation |
CN110348940A (en) * | 2019-05-28 | 2019-10-18 | 成都美美臣科技有限公司 | A kind of method that e-commerce website search is suggested |
CN111611508A (en) * | 2020-05-28 | 2020-09-01 | 江苏易安联网络技术有限公司 | Identification method and device for actual website access of user |
CN111611508B (en) * | 2020-05-28 | 2020-12-15 | 江苏易安联网络技术有限公司 | Identification method and device for actual website access of user |
CN113010636A (en) * | 2021-02-23 | 2021-06-22 | 玉米社(深圳)网络科技有限公司 | Method for rapidly detecting ranking of all keywords of website |
Also Published As
Publication number | Publication date |
---|---|
CN107291956B (en) | 2018-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291956B (en) | The system and method for record information is searched in a kind of quick search website | |
JP6582085B2 (en) | Method and apparatus for generating web page content | |
CN102663048B (en) | Method and device for providing search result | |
Bachlechner et al. | Web service discovery-a reality check | |
CN102932206B (en) | The method and system of monitoring website access information | |
CN104216921B (en) | A kind of addition reminding method, apparatus and system for realizing quick links in browser | |
CN102663617A (en) | Method and system for prediction of advertisement clicking rate | |
CN106126648B (en) | It is a kind of based on the distributed merchandise news crawler method redo log | |
CN102043783A (en) | Data updating method, device and system | |
CN105512153A (en) | Method and device for service provision of online customer service system, and system | |
JP2006511884A5 (en) | ||
CN101546308B (en) | Web page search method and web page search system based on overdue retrieval | |
CN107463641A (en) | System and method for improving the access to search result | |
CN101729288B (en) | Method and device for counting network access behaviours of internet users | |
CN106295382B (en) | A kind of Information Risk preventing control method and device | |
CN106484709A (en) | A kind of auditing method of daily record data and audit device | |
CN104933069A (en) | Method and system for analyzing web browsing statistics of desktop terminal | |
CN107193831A (en) | Information recommendation method and device | |
CN103412940B (en) | The method of detection swindle phone | |
CN107592305A (en) | A kind of anti-brush method and system based on elk and redis | |
CN107835132B (en) | Method and device for tracking flow source | |
CN106656929A (en) | Information processing method and apparatus | |
CN104902498A (en) | Identification method and device for subscriber re-networking | |
CN103077196B (en) | A kind of access method from public network WEB website to intranet data storehouse | |
CN201114128Y (en) | Enterprise search engine device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |