CN105630673B - A kind of automated testing method and device of web crawlers rate - Google Patents
A kind of automated testing method and device of web crawlers rate Download PDFInfo
- Publication number
- CN105630673B CN105630673B CN201510957702.3A CN201510957702A CN105630673B CN 105630673 B CN105630673 B CN 105630673B CN 201510957702 A CN201510957702 A CN 201510957702A CN 105630673 B CN105630673 B CN 105630673B
- Authority
- CN
- China
- Prior art keywords
- crawlers
- object linking
- full rate
- automatic test
- test device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention discloses the automated testing methods and device of a kind of web crawlers rate.This method comprises: the associated webpage of kind subnet that access is read from the crawler seed bank of crawlers, and setting numerical value candidate link is obtained from the link attribute of the webpage;The candidate link is screened, to obtain Object linking, and the Object linking is imported into testing tool;According to the crawler result data of the Object linking and the crawlers that import in the testing tool, determine the crawlers climbs full rate.Technical solution in the embodiment of the present invention, the performance of manual authentication crawlers, improves the testing efficiency to crawlers compared with the prior art.
Description
Technical field
The present embodiments relate to software testing technology field more particularly to a kind of automatic test sides of web crawlers rate
Method and device.
Background technique
In recent years, China Internet user is in explosive growth, and various Websites flourish like the mushrooms after rain.Face
To the information of such magnanimity, need to extract valuable data screening, government can be used for the analysis of public opinion, network security
Monitoring;Enterprise can be used for market survey, Media Analysis.
Under the overall situation of information explosive growth, spiders technology is particularly important.Can crawlers timely
Desired information is crawled, whether the web data crawled is complete, and whether information is correct, is the important indicator for embodying properties of product.
But the data of the thousands of a webpages of manual authentication whether crawl it is timely, comprehensive, correct, if all storage, it is time-consuming and laborious,
Therefore there is an urgent need to it is a kind of can automatic test spiders climb the test method of full rate, to improve the test to crawlers
Efficiency.
Summary of the invention
The present invention provides the automated testing method and device of a kind of web crawlers rate, to improve the test to crawlers
Efficiency.
In a first aspect, the embodiment of the invention provides a kind of automated testing methods of web crawlers rate, comprising:
Access the associated webpage of kind subnet that reads from the crawler seed bank of crawlers, and from the chain of the webpage
It connects and obtains setting numerical value candidate link in attribute;
The candidate link is screened, to obtain Object linking, and the Object linking is imported into testing tool
In;
According to the crawler result data of the Object linking and the crawlers that are imported in the testing tool, determine described in
Crawlers climb full rate.
Second aspect, the embodiment of the invention provides a kind of automatic test devices of web crawlers rate, comprising:
Candidate link module, for accessing the associated net of kind subnet read from the crawler seed bank of crawlers
Page, and setting numerical value candidate link is obtained from the link attribute of the webpage;
Target-linked module, for being screened to the candidate link, to obtain Object linking, and by the object chain
It connects and imported into testing tool;
Full rate module is climbed, for the crawler knot according to the Object linking and the crawlers that import in the testing tool
Fruit data, determine the crawlers climbs full rate.
Technical solution provided in an embodiment of the present invention passes through kind subnet association in the crawler seed bank of access crawlers
Webpage, filter out Object linking from the link attribute of webpage, according to the crawler result data of Object linking and crawlers,
Determine crawlers climbs full rate, and the performance of manual authentication crawlers, improves to crawlers compared with the prior art
Testing efficiency.
Detailed description of the invention
Fig. 1 a is the flow chart of the automated testing method of one of embodiment of the present invention one web crawlers rate;
Fig. 1 b is the schematic diagram of the kind subnet and regular expression in the embodiment of the present invention one;
Fig. 1 c is the schematic diagram of the associated matching result of Object linking in the embodiment of the present invention one;
Fig. 1 d is the Object linking schematic diagram of the excel format in the embodiment of the present invention one;
Fig. 2 is the flow chart of the automated testing method of one of embodiment of the present invention two web crawlers rate;
Fig. 3 is the structural schematic diagram of the automatic test device of one of embodiment of the present invention three web crawlers rate.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 a is the flow chart of the automated testing method of one of embodiment of the present invention one web crawlers rate.This method
It can be executed by the automatic test device of web crawlers rate, which can be realized by way of hardware and/or software, can
It is configured in test machine, automated test tool (such as QuickTest Professional), number is installed in the test machine
According to library (such as oracle database or Full-text Database) and browser.
As shown in Figure 1a, the method specifically comprises the following steps:
Step 11, the associated webpage of kind subnet that is read from the crawler seed bank of crawlers of access, and from described
Setting numerical value candidate link is obtained in the link attribute of webpage.
In the present embodiment, crawlers be according to certain rules, automatically grab web message program or
Script, kind subnet refer to that crawlers will grab the network address of information affiliated web site, with reference to Fig. 1 b, if crawlers needs are climbed
The information of 100 news websites or forum website is taken, such as crawls news, blog or forum's information, then planting subnet includes 100
The network address of a news website or forum website.Kind subnet is stored in crawler seed bank.The link attribute of webpage refers to net
The link of article, news, blog article or model that page includes, such as kind of a subnet tieba.baidu.com, associated webpage is
The link of discussion bar forum of Baidu, the model that discussion bar forum of Baidu includes constitutes link attribute.
Specifically, obtaining kind of a subnet from the crawler seed bank of crawlers, visited by browser (such as IE browser)
It asks kind of subnet associated webpage, and obtains setting numerical value candidate link from the link attribute of webpage, wherein setting numerical value
It can be set according to testing requirement, such as setting numerical value can be 30.
Step 12 screens the candidate link, to obtain Object linking, and the Object linking is imported into survey
In trial work tool.
Preferably, the candidate link is screened, to obtain Object linking, comprising: according to from the crawlers
Template library obtained in regular expression, candidate link is screened, and to obtain Object linking, wherein module library is for storing
The associated field in seed address and regular expression, regular expression refer to filtering ID (with reference to Fig. 1 b), are used for title mistake
Filter.It uses the regular expression of crawlers to screen to obtain Object linking candidate link in the present embodiment, such as screens
8 Object linkings out, to improve the candidate accuracy for determining according to Object linking and climbing full rate.It should be noted that can also
To screen according to the screening rule in addition to regular expression to candidate link, wherein screening rule can be according to testing requirement
It is set.
Also, also obtained Object linking is imported into testing tool, such as imported into testing tool QuickTest
In Professional in the query frame (DataTable:Query) of tables of data.
Step 13, according to the crawler result data of the Object linking and the crawlers that are imported in the testing tool,
Determine the crawlers climbs full rate.
In the present embodiment, crawler result data refers to the data of crawlers crawl.Specifically, can be according to target
Matching degree between link and crawler result data, determine crawlers climbs full rate, with reference to Fig. 1 c, for each object chain
It connects, if having and the title of the Object linking or the address uniform resource locator (Uniform Resource Locator, URL)
The crawler result data matched, it is determined that the associated matching result of the Object linking takes 1, and otherwise matching result takes 0;According to all
With the sum of result, determine the crawlers climbs full rate.
Technical solution provided in this embodiment, by planting the associated net of subnet in the crawler seed bank of access crawlers
Page filters out Object linking from the link attribute of webpage, according to the crawler result data of Object linking and crawlers, determines
Crawlers climb full rate, compared with the prior art the performance of manual authentication crawlers, improve the test to crawlers
Efficiency.
Illustratively, with reference to Fig. 1 d, the candidate link is screened, after obtaining Object linking, can also be wrapped
It includes: by the title of the Object linking and/or the storage of the address URL into excel table.Such as, the title of Object linking is existed
In the B column of excel table, by the title of Object linking, there are in the C of excel table column.The present embodiment is by the mark of Object linking
Topic and/or the storage of the address URL only store in testing tool into the excel table of test machine, avoid testing tool
The abnormal conditions such as power-off cause Object linking to be lost, and improve the safety of Object linking.
Embodiment two
The present embodiment provides a kind of automatic test side of new web crawlers rate on the basis of the above embodiment 1
Method.Fig. 2 is the flow chart of the automated testing method of one of embodiment of the present invention two web crawlers rate.This method can be by
The automatic test device of web crawlers rate executes, which can be realized by way of hardware and/or software, be configured in
In test machine.As shown in Fig. 2, the method specifically comprises the following steps:
Step 21, the associated webpage of kind subnet that is read from the crawler seed bank of crawlers of access, and from described
Setting numerical value candidate link is obtained in the link attribute of webpage.
Step 22 screens the candidate link, to obtain Object linking, and the Object linking is imported into survey
In trial work tool.
Step 23, for each Object linking imported in the testing tool, it is determined whether have and the Object linking
The crawler result data of title or uniform resource position mark URL address matching, if so, then adding 1 to match parameter, wherein initial
It is 0 with parameter.
Specifically, the value of match parameter can be used to accurate response Object linking since initial matching parameter is 0
With the matching degree between crawler result data.
Step 24, according to the total quantity of the Object linking and the value of the match parameter, determine climbing for crawlers
Full rate.
Illustratively, described according to the quantity of the Object linking and the value of the match parameter, determine crawlers
Climb full rate, may include:
According to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
Technical solution provided in this embodiment, by planting the associated net of subnet in the crawler seed bank of access crawlers
Page filters out Object linking from the link attribute of webpage, according between Object linking and the crawler result data of crawlers
Matching relationship, obtain the value of match parameter, according to the value of match parameter and the total quantity of Object linking, determine crawler journey
The testing efficiency of sequence climbed full rate, improve to crawlers.
Embodiment three
Fig. 3 is the structural schematic diagram of the automatic test device of one of embodiment of the present invention three web crawlers rate, institute
It states device and is configured at test machine, as shown in figure 3, the automatic test device of the web crawlers rate can specifically include:
Candidate link module 31, for accessing the associated net of kind subnet read from the crawler seed bank of crawlers
Page, and setting numerical value candidate link is obtained from the link attribute of the webpage;
Target-linked module 32, for being screened to the candidate link, to obtain Object linking, and by the target
Link is imported into testing tool;
Full rate module 33 is climbed, for the crawler according to the Object linking and the crawlers that import in the testing tool
Result data, determine the crawlers climbs full rate.
Illustratively, target-linked module 32 specifically can be used for:
According to regular expression obtained in the template library from the crawlers, candidate link is screened, with
To Object linking.
Illustratively, described to climb full rate module 33 and may include:
Match parameter unit, for for each Object linking imported in the testing tool, it is determined whether have and be somebody's turn to do
The title of Object linking or the crawler result data of URL address matching, if so, then adding 1 to match parameter, wherein initial matching is joined
Number is 0;
Full rate unit is climbed, for the value of total quantity and the match parameter according to the Object linking, determines crawler
Program climbs full rate.
Illustratively, described to climb full rate unit and specifically can be used for:
According to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
Illustratively, the automatic test device of above-mentioned web crawlers rate can also include:
Object linking memory module, will be described after obtaining Object linking for screening to the candidate link
The title of Object linking and the storage of the address URL are into excel table.
The automatic test device of web crawlers rate provided in this embodiment, with net provided by any embodiment of the invention
The automated testing method of network crawler rate belongs to same inventive concept, and network provided by any embodiment of the invention can be performed and climb
The automated testing method of worm rate has the corresponding functional module of automated testing method for executing web crawlers rate and beneficial to effect
Fruit.The not technical detail of detailed description in the present embodiment, reference can be made to the web crawlers rate that any embodiment of that present invention provides
Automated testing method.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (6)
1. a kind of automated testing method of web crawlers rate characterized by comprising
The kind subnet association that the automatic test device access of web crawlers rate is read from the crawler seed bank of crawlers
Webpage, and obtain from the link attribute of the webpage setting numerical value candidate link;
The automatic test device screens the candidate link, to obtain Object linking, and by the Object linking
It imported into testing tool;
The automatic test device is according to the Object linking imported in the testing tool and the crawler knot of the crawlers
Fruit data, determine the crawlers climbs full rate;
The automatic test device screens the candidate link, after obtaining Object linking, further includes:
The automatic test device is by the title of the Object linking and/or the storage of the address URL into excel table;
The automatic test device screens the candidate link, to obtain Object linking, comprising:
The automatic test device is according to regular expression obtained in the template library from the crawlers, to candidate link
It is screened, to obtain Object linking.
2. the method according to claim 1, wherein the automatic test device is according in the testing tool
The crawler result data of the Object linking of importing and the crawlers, determine the crawlers climbs full rate, comprising:
The automatic test device is for each Object linking imported in the testing tool, it is determined whether has and the target
The title of link or the crawler result data of URL address matching, if so, then adding 1 to match parameter, wherein initial matching parameter is
0;
The automatic test device determines crawler journey according to the total quantity of the Object linking and the value of the match parameter
Sequence climbs full rate.
3. according to the method described in claim 2, it is characterized in that, the automatic test device is according to the Object linking
The value of quantity and the match parameter, determine crawlers climbs full rate, comprising:
For the automatic test device according to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
4. a kind of automatic test device of web crawlers rate characterized by comprising
Candidate link module, for accessing the associated webpage of kind subnet read from the crawler seed bank of crawlers, and
Setting numerical value candidate link is obtained from the link attribute of the webpage;
Target-linked module to obtain Object linking, and the Object linking is led for screening to the candidate link
Enter into testing tool;
Full rate module is climbed, for the crawler number of results according to the Object linking and the crawlers that import in the testing tool
According to determine the crawlers climbs full rate;
The automatic test device further include:
Object linking memory module, for being screened to the candidate link, after obtaining Object linking, by the target
The title of link and the storage of the address URL are into excel table;
Target-linked module is specifically used for:
According to regular expression obtained in the template library from the crawlers, candidate link is screened, to obtain mesh
Mark link.
5. device according to claim 4, which is characterized in that described to climb full rate module and include:
Match parameter unit, for for each Object linking imported in the testing tool, it is determined whether have and the target
The title of link or the crawler result data of URL address matching, if so, then adding 1 to match parameter, wherein initial matching parameter is
0;
Full rate unit is climbed, for the value of total quantity and the match parameter according to the Object linking, determines crawlers
Climb full rate.
6. device according to claim 5, which is characterized in that described to climb full rate unit and be specifically used for:
According to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510957702.3A CN105630673B (en) | 2015-12-17 | 2015-12-17 | A kind of automated testing method and device of web crawlers rate |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510957702.3A CN105630673B (en) | 2015-12-17 | 2015-12-17 | A kind of automated testing method and device of web crawlers rate |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105630673A CN105630673A (en) | 2016-06-01 |
CN105630673B true CN105630673B (en) | 2018-12-25 |
Family
ID=56045643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510957702.3A Active CN105630673B (en) | 2015-12-17 | 2015-12-17 | A kind of automated testing method and device of web crawlers rate |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105630673B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111949852A (en) * | 2020-08-31 | 2020-11-17 | 东华理工大学 | Macroscopic economy analysis method and system based on internet big data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN202383681U (en) * | 2011-12-23 | 2012-08-15 | 江苏省现代企业信息化应用支撑软件工程技术研发中心 | Webpage acquiring device based on gathered crawlers |
CN102662954A (en) * | 2012-03-02 | 2012-09-12 | 杭州电子科技大学 | Method for implementing topical crawler system based on learning URL string information |
CN102930059A (en) * | 2012-11-26 | 2013-02-13 | 电子科技大学 | Method for designing focused crawler |
CN102929920A (en) * | 2012-09-19 | 2013-02-13 | 北京奇虎科技有限公司 | Web-information-extraction-based monitoring method and device for software updating information |
CN103984749A (en) * | 2014-05-27 | 2014-08-13 | 电子科技大学 | Focused crawler method based on link analysis |
CN104462158A (en) * | 2013-09-25 | 2015-03-25 | 北大方正集团有限公司 | Data grabbing method and data grabbing system |
CN104794193A (en) * | 2015-04-17 | 2015-07-22 | 南京大学 | Webpage increment capture method for valid link acquisition |
-
2015
- 2015-12-17 CN CN201510957702.3A patent/CN105630673B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN202383681U (en) * | 2011-12-23 | 2012-08-15 | 江苏省现代企业信息化应用支撑软件工程技术研发中心 | Webpage acquiring device based on gathered crawlers |
CN102662954A (en) * | 2012-03-02 | 2012-09-12 | 杭州电子科技大学 | Method for implementing topical crawler system based on learning URL string information |
CN102929920A (en) * | 2012-09-19 | 2013-02-13 | 北京奇虎科技有限公司 | Web-information-extraction-based monitoring method and device for software updating information |
CN102930059A (en) * | 2012-11-26 | 2013-02-13 | 电子科技大学 | Method for designing focused crawler |
CN104462158A (en) * | 2013-09-25 | 2015-03-25 | 北大方正集团有限公司 | Data grabbing method and data grabbing system |
CN103984749A (en) * | 2014-05-27 | 2014-08-13 | 电子科技大学 | Focused crawler method based on link analysis |
CN104794193A (en) * | 2015-04-17 | 2015-07-22 | 南京大学 | Webpage increment capture method for valid link acquisition |
Non-Patent Citations (3)
Title |
---|
Focused crawling enhanced by CBP-SLC;Tao Peng 等;《Knowledge-Based Systems》;20130630;第5节,第6.1.1节 * |
一种基于链接和内容分析的自适应主题爬虫算法;朱庆生 等;《计算机与现代化》;20150930;全文 * |
可定制的聚焦网络爬虫;邹海亮;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091031;正文第6-11页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105630673A (en) | 2016-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103605738B (en) | Web page access data statistical method and device | |
CN102930059B (en) | Method for designing focused crawler | |
CN103530365B (en) | Obtain the method and system of the download link of resource | |
CN103365839B (en) | The recommendation searching method and device of a kind of search engine | |
CN107957957A (en) | The acquisition methods and device of test case | |
CN102663048B (en) | Method and device for providing search result | |
CN105404699A (en) | Method, device and server for searching articles of finance and economics | |
CN103294732B (en) | Webpage capture method and reptile | |
CN109376291B (en) | Website fingerprint information scanning method and device based on web crawler | |
CN104182548B (en) | Webpage updates processing method and processing device | |
CN106933906B (en) | Data multi-dimensional query method and device | |
CN107437026B (en) | Malicious webpage advertisement detection method based on advertisement network topology | |
CN102609511B (en) | Navigation page data processing method and processing device | |
CN102760151A (en) | Implementation method of open source software acquisition and searching system | |
US20060274767A1 (en) | System and method for collecting, processing and presenting selected information from selected sources via a single website | |
CN105335246B (en) | A kind of program crashing defect self-repairing method based on question and answer web analytics | |
CN103927400A (en) | Web site product detailed information classification crawling and product information base establishing method | |
CN101441629A (en) | Automatic acquiring method of non-structured web page information | |
CN104462445A (en) | Webpage access data processing method and webpage access data processing device | |
CN103390048B (en) | Chained address update method and device | |
CN106547803A (en) | The method and apparatus for crawling website incremental resource | |
CN103605744B (en) | The analysis method and device of site search engine data on flows | |
CN105630673B (en) | A kind of automated testing method and device of web crawlers rate | |
CN103902725B (en) | The acquisition methods of search engine optimization information and device | |
CN104951476B (en) | Method and device for confirming link rank in website |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |