CN105630673B - A kind of automated testing method and device of web crawlers rate - Google Patents

A kind of automated testing method and device of web crawlers rate Download PDF

Info

Publication number
CN105630673B
CN105630673B CN201510957702.3A CN201510957702A CN105630673B CN 105630673 B CN105630673 B CN 105630673B CN 201510957702 A CN201510957702 A CN 201510957702A CN 105630673 B CN105630673 B CN 105630673B
Authority
CN
China
Prior art keywords
crawlers
object linking
full rate
automatic test
test device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510957702.3A
Other languages
Chinese (zh)
Other versions
CN105630673A (en
Inventor
徐香联
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN201510957702.3A priority Critical patent/CN105630673B/en
Publication of CN105630673A publication Critical patent/CN105630673A/en
Application granted granted Critical
Publication of CN105630673B publication Critical patent/CN105630673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses the automated testing methods and device of a kind of web crawlers rate.This method comprises: the associated webpage of kind subnet that access is read from the crawler seed bank of crawlers, and setting numerical value candidate link is obtained from the link attribute of the webpage;The candidate link is screened, to obtain Object linking, and the Object linking is imported into testing tool;According to the crawler result data of the Object linking and the crawlers that import in the testing tool, determine the crawlers climbs full rate.Technical solution in the embodiment of the present invention, the performance of manual authentication crawlers, improves the testing efficiency to crawlers compared with the prior art.

Description

A kind of automated testing method and device of web crawlers rate
Technical field
The present embodiments relate to software testing technology field more particularly to a kind of automatic test sides of web crawlers rate Method and device.
Background technique
In recent years, China Internet user is in explosive growth, and various Websites flourish like the mushrooms after rain.Face To the information of such magnanimity, need to extract valuable data screening, government can be used for the analysis of public opinion, network security Monitoring;Enterprise can be used for market survey, Media Analysis.
Under the overall situation of information explosive growth, spiders technology is particularly important.Can crawlers timely Desired information is crawled, whether the web data crawled is complete, and whether information is correct, is the important indicator for embodying properties of product. But the data of the thousands of a webpages of manual authentication whether crawl it is timely, comprehensive, correct, if all storage, it is time-consuming and laborious, Therefore there is an urgent need to it is a kind of can automatic test spiders climb the test method of full rate, to improve the test to crawlers Efficiency.
Summary of the invention
The present invention provides the automated testing method and device of a kind of web crawlers rate, to improve the test to crawlers Efficiency.
In a first aspect, the embodiment of the invention provides a kind of automated testing methods of web crawlers rate, comprising:
Access the associated webpage of kind subnet that reads from the crawler seed bank of crawlers, and from the chain of the webpage It connects and obtains setting numerical value candidate link in attribute;
The candidate link is screened, to obtain Object linking, and the Object linking is imported into testing tool In;
According to the crawler result data of the Object linking and the crawlers that are imported in the testing tool, determine described in Crawlers climb full rate.
Second aspect, the embodiment of the invention provides a kind of automatic test devices of web crawlers rate, comprising:
Candidate link module, for accessing the associated net of kind subnet read from the crawler seed bank of crawlers Page, and setting numerical value candidate link is obtained from the link attribute of the webpage;
Target-linked module, for being screened to the candidate link, to obtain Object linking, and by the object chain It connects and imported into testing tool;
Full rate module is climbed, for the crawler knot according to the Object linking and the crawlers that import in the testing tool Fruit data, determine the crawlers climbs full rate.
Technical solution provided in an embodiment of the present invention passes through kind subnet association in the crawler seed bank of access crawlers Webpage, filter out Object linking from the link attribute of webpage, according to the crawler result data of Object linking and crawlers, Determine crawlers climbs full rate, and the performance of manual authentication crawlers, improves to crawlers compared with the prior art Testing efficiency.
Detailed description of the invention
Fig. 1 a is the flow chart of the automated testing method of one of embodiment of the present invention one web crawlers rate;
Fig. 1 b is the schematic diagram of the kind subnet and regular expression in the embodiment of the present invention one;
Fig. 1 c is the schematic diagram of the associated matching result of Object linking in the embodiment of the present invention one;
Fig. 1 d is the Object linking schematic diagram of the excel format in the embodiment of the present invention one;
Fig. 2 is the flow chart of the automated testing method of one of embodiment of the present invention two web crawlers rate;
Fig. 3 is the structural schematic diagram of the automatic test device of one of embodiment of the present invention three web crawlers rate.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 a is the flow chart of the automated testing method of one of embodiment of the present invention one web crawlers rate.This method It can be executed by the automatic test device of web crawlers rate, which can be realized by way of hardware and/or software, can It is configured in test machine, automated test tool (such as QuickTest Professional), number is installed in the test machine According to library (such as oracle database or Full-text Database) and browser.
As shown in Figure 1a, the method specifically comprises the following steps:
Step 11, the associated webpage of kind subnet that is read from the crawler seed bank of crawlers of access, and from described Setting numerical value candidate link is obtained in the link attribute of webpage.
In the present embodiment, crawlers be according to certain rules, automatically grab web message program or Script, kind subnet refer to that crawlers will grab the network address of information affiliated web site, with reference to Fig. 1 b, if crawlers needs are climbed The information of 100 news websites or forum website is taken, such as crawls news, blog or forum's information, then planting subnet includes 100 The network address of a news website or forum website.Kind subnet is stored in crawler seed bank.The link attribute of webpage refers to net The link of article, news, blog article or model that page includes, such as kind of a subnet tieba.baidu.com, associated webpage is The link of discussion bar forum of Baidu, the model that discussion bar forum of Baidu includes constitutes link attribute.
Specifically, obtaining kind of a subnet from the crawler seed bank of crawlers, visited by browser (such as IE browser) It asks kind of subnet associated webpage, and obtains setting numerical value candidate link from the link attribute of webpage, wherein setting numerical value It can be set according to testing requirement, such as setting numerical value can be 30.
Step 12 screens the candidate link, to obtain Object linking, and the Object linking is imported into survey In trial work tool.
Preferably, the candidate link is screened, to obtain Object linking, comprising: according to from the crawlers Template library obtained in regular expression, candidate link is screened, and to obtain Object linking, wherein module library is for storing The associated field in seed address and regular expression, regular expression refer to filtering ID (with reference to Fig. 1 b), are used for title mistake Filter.It uses the regular expression of crawlers to screen to obtain Object linking candidate link in the present embodiment, such as screens 8 Object linkings out, to improve the candidate accuracy for determining according to Object linking and climbing full rate.It should be noted that can also To screen according to the screening rule in addition to regular expression to candidate link, wherein screening rule can be according to testing requirement It is set.
Also, also obtained Object linking is imported into testing tool, such as imported into testing tool QuickTest In Professional in the query frame (DataTable:Query) of tables of data.
Step 13, according to the crawler result data of the Object linking and the crawlers that are imported in the testing tool, Determine the crawlers climbs full rate.
In the present embodiment, crawler result data refers to the data of crawlers crawl.Specifically, can be according to target Matching degree between link and crawler result data, determine crawlers climbs full rate, with reference to Fig. 1 c, for each object chain It connects, if having and the title of the Object linking or the address uniform resource locator (Uniform Resource Locator, URL) The crawler result data matched, it is determined that the associated matching result of the Object linking takes 1, and otherwise matching result takes 0;According to all With the sum of result, determine the crawlers climbs full rate.
Technical solution provided in this embodiment, by planting the associated net of subnet in the crawler seed bank of access crawlers Page filters out Object linking from the link attribute of webpage, according to the crawler result data of Object linking and crawlers, determines Crawlers climb full rate, compared with the prior art the performance of manual authentication crawlers, improve the test to crawlers Efficiency.
Illustratively, with reference to Fig. 1 d, the candidate link is screened, after obtaining Object linking, can also be wrapped It includes: by the title of the Object linking and/or the storage of the address URL into excel table.Such as, the title of Object linking is existed In the B column of excel table, by the title of Object linking, there are in the C of excel table column.The present embodiment is by the mark of Object linking Topic and/or the storage of the address URL only store in testing tool into the excel table of test machine, avoid testing tool The abnormal conditions such as power-off cause Object linking to be lost, and improve the safety of Object linking.
Embodiment two
The present embodiment provides a kind of automatic test side of new web crawlers rate on the basis of the above embodiment 1 Method.Fig. 2 is the flow chart of the automated testing method of one of embodiment of the present invention two web crawlers rate.This method can be by The automatic test device of web crawlers rate executes, which can be realized by way of hardware and/or software, be configured in In test machine.As shown in Fig. 2, the method specifically comprises the following steps:
Step 21, the associated webpage of kind subnet that is read from the crawler seed bank of crawlers of access, and from described Setting numerical value candidate link is obtained in the link attribute of webpage.
Step 22 screens the candidate link, to obtain Object linking, and the Object linking is imported into survey In trial work tool.
Step 23, for each Object linking imported in the testing tool, it is determined whether have and the Object linking The crawler result data of title or uniform resource position mark URL address matching, if so, then adding 1 to match parameter, wherein initial It is 0 with parameter.
Specifically, the value of match parameter can be used to accurate response Object linking since initial matching parameter is 0 With the matching degree between crawler result data.
Step 24, according to the total quantity of the Object linking and the value of the match parameter, determine climbing for crawlers Full rate.
Illustratively, described according to the quantity of the Object linking and the value of the match parameter, determine crawlers Climb full rate, may include:
According to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
Technical solution provided in this embodiment, by planting the associated net of subnet in the crawler seed bank of access crawlers Page filters out Object linking from the link attribute of webpage, according between Object linking and the crawler result data of crawlers Matching relationship, obtain the value of match parameter, according to the value of match parameter and the total quantity of Object linking, determine crawler journey The testing efficiency of sequence climbed full rate, improve to crawlers.
Embodiment three
Fig. 3 is the structural schematic diagram of the automatic test device of one of embodiment of the present invention three web crawlers rate, institute It states device and is configured at test machine, as shown in figure 3, the automatic test device of the web crawlers rate can specifically include:
Candidate link module 31, for accessing the associated net of kind subnet read from the crawler seed bank of crawlers Page, and setting numerical value candidate link is obtained from the link attribute of the webpage;
Target-linked module 32, for being screened to the candidate link, to obtain Object linking, and by the target Link is imported into testing tool;
Full rate module 33 is climbed, for the crawler according to the Object linking and the crawlers that import in the testing tool Result data, determine the crawlers climbs full rate.
Illustratively, target-linked module 32 specifically can be used for:
According to regular expression obtained in the template library from the crawlers, candidate link is screened, with To Object linking.
Illustratively, described to climb full rate module 33 and may include:
Match parameter unit, for for each Object linking imported in the testing tool, it is determined whether have and be somebody's turn to do The title of Object linking or the crawler result data of URL address matching, if so, then adding 1 to match parameter, wherein initial matching is joined Number is 0;
Full rate unit is climbed, for the value of total quantity and the match parameter according to the Object linking, determines crawler Program climbs full rate.
Illustratively, described to climb full rate unit and specifically can be used for:
According to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
Illustratively, the automatic test device of above-mentioned web crawlers rate can also include:
Object linking memory module, will be described after obtaining Object linking for screening to the candidate link The title of Object linking and the storage of the address URL are into excel table.
The automatic test device of web crawlers rate provided in this embodiment, with net provided by any embodiment of the invention The automated testing method of network crawler rate belongs to same inventive concept, and network provided by any embodiment of the invention can be performed and climb The automated testing method of worm rate has the corresponding functional module of automated testing method for executing web crawlers rate and beneficial to effect Fruit.The not technical detail of detailed description in the present embodiment, reference can be made to the web crawlers rate that any embodiment of that present invention provides Automated testing method.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (6)

1. a kind of automated testing method of web crawlers rate characterized by comprising
The kind subnet association that the automatic test device access of web crawlers rate is read from the crawler seed bank of crawlers Webpage, and obtain from the link attribute of the webpage setting numerical value candidate link;
The automatic test device screens the candidate link, to obtain Object linking, and by the Object linking It imported into testing tool;
The automatic test device is according to the Object linking imported in the testing tool and the crawler knot of the crawlers Fruit data, determine the crawlers climbs full rate;
The automatic test device screens the candidate link, after obtaining Object linking, further includes:
The automatic test device is by the title of the Object linking and/or the storage of the address URL into excel table;
The automatic test device screens the candidate link, to obtain Object linking, comprising:
The automatic test device is according to regular expression obtained in the template library from the crawlers, to candidate link It is screened, to obtain Object linking.
2. the method according to claim 1, wherein the automatic test device is according in the testing tool The crawler result data of the Object linking of importing and the crawlers, determine the crawlers climbs full rate, comprising:
The automatic test device is for each Object linking imported in the testing tool, it is determined whether has and the target The title of link or the crawler result data of URL address matching, if so, then adding 1 to match parameter, wherein initial matching parameter is 0;
The automatic test device determines crawler journey according to the total quantity of the Object linking and the value of the match parameter Sequence climbs full rate.
3. according to the method described in claim 2, it is characterized in that, the automatic test device is according to the Object linking The value of quantity and the match parameter, determine crawlers climbs full rate, comprising:
For the automatic test device according to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
4. a kind of automatic test device of web crawlers rate characterized by comprising
Candidate link module, for accessing the associated webpage of kind subnet read from the crawler seed bank of crawlers, and Setting numerical value candidate link is obtained from the link attribute of the webpage;
Target-linked module to obtain Object linking, and the Object linking is led for screening to the candidate link Enter into testing tool;
Full rate module is climbed, for the crawler number of results according to the Object linking and the crawlers that import in the testing tool According to determine the crawlers climbs full rate;
The automatic test device further include:
Object linking memory module, for being screened to the candidate link, after obtaining Object linking, by the target The title of link and the storage of the address URL are into excel table;
Target-linked module is specifically used for:
According to regular expression obtained in the template library from the crawlers, candidate link is screened, to obtain mesh Mark link.
5. device according to claim 4, which is characterized in that described to climb full rate module and include:
Match parameter unit, for for each Object linking imported in the testing tool, it is determined whether have and the target The title of link or the crawler result data of URL address matching, if so, then adding 1 to match parameter, wherein initial matching parameter is 0;
Full rate unit is climbed, for the value of total quantity and the match parameter according to the Object linking, determines crawlers Climb full rate.
6. device according to claim 5, which is characterized in that described to climb full rate unit and be specifically used for:
According to following formula, calculate crawlers climbs full rate:
K=n/m, wherein k is full rate of climbing, and n is the value of match parameter, and m is the total quantity of the Object linking.
CN201510957702.3A 2015-12-17 2015-12-17 A kind of automated testing method and device of web crawlers rate Active CN105630673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510957702.3A CN105630673B (en) 2015-12-17 2015-12-17 A kind of automated testing method and device of web crawlers rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510957702.3A CN105630673B (en) 2015-12-17 2015-12-17 A kind of automated testing method and device of web crawlers rate

Publications (2)

Publication Number Publication Date
CN105630673A CN105630673A (en) 2016-06-01
CN105630673B true CN105630673B (en) 2018-12-25

Family

ID=56045643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510957702.3A Active CN105630673B (en) 2015-12-17 2015-12-17 A kind of automated testing method and device of web crawlers rate

Country Status (1)

Country Link
CN (1) CN105630673B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949852A (en) * 2020-08-31 2020-11-17 东华理工大学 Macroscopic economy analysis method and system based on internet big data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202383681U (en) * 2011-12-23 2012-08-15 江苏省现代企业信息化应用支撑软件工程技术研发中心 Webpage acquiring device based on gathered crawlers
CN102662954A (en) * 2012-03-02 2012-09-12 杭州电子科技大学 Method for implementing topical crawler system based on learning URL string information
CN102930059A (en) * 2012-11-26 2013-02-13 电子科技大学 Method for designing focused crawler
CN102929920A (en) * 2012-09-19 2013-02-13 北京奇虎科技有限公司 Web-information-extraction-based monitoring method and device for software updating information
CN103984749A (en) * 2014-05-27 2014-08-13 电子科技大学 Focused crawler method based on link analysis
CN104462158A (en) * 2013-09-25 2015-03-25 北大方正集团有限公司 Data grabbing method and data grabbing system
CN104794193A (en) * 2015-04-17 2015-07-22 南京大学 Webpage increment capture method for valid link acquisition

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202383681U (en) * 2011-12-23 2012-08-15 江苏省现代企业信息化应用支撑软件工程技术研发中心 Webpage acquiring device based on gathered crawlers
CN102662954A (en) * 2012-03-02 2012-09-12 杭州电子科技大学 Method for implementing topical crawler system based on learning URL string information
CN102929920A (en) * 2012-09-19 2013-02-13 北京奇虎科技有限公司 Web-information-extraction-based monitoring method and device for software updating information
CN102930059A (en) * 2012-11-26 2013-02-13 电子科技大学 Method for designing focused crawler
CN104462158A (en) * 2013-09-25 2015-03-25 北大方正集团有限公司 Data grabbing method and data grabbing system
CN103984749A (en) * 2014-05-27 2014-08-13 电子科技大学 Focused crawler method based on link analysis
CN104794193A (en) * 2015-04-17 2015-07-22 南京大学 Webpage increment capture method for valid link acquisition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Focused crawling enhanced by CBP-SLC;Tao Peng 等;《Knowledge-Based Systems》;20130630;第5节,第6.1.1节 *
一种基于链接和内容分析的自适应主题爬虫算法;朱庆生 等;《计算机与现代化》;20150930;全文 *
可定制的聚焦网络爬虫;邹海亮;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091031;正文第6-11页 *

Also Published As

Publication number Publication date
CN105630673A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN103605738B (en) Web page access data statistical method and device
CN102930059B (en) Method for designing focused crawler
CN103530365B (en) Obtain the method and system of the download link of resource
CN103365839B (en) The recommendation searching method and device of a kind of search engine
CN107957957A (en) The acquisition methods and device of test case
CN102663048B (en) Method and device for providing search result
CN105404699A (en) Method, device and server for searching articles of finance and economics
CN103294732B (en) Webpage capture method and reptile
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
CN104182548B (en) Webpage updates processing method and processing device
CN106933906B (en) Data multi-dimensional query method and device
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN102609511B (en) Navigation page data processing method and processing device
CN102760151A (en) Implementation method of open source software acquisition and searching system
US20060274767A1 (en) System and method for collecting, processing and presenting selected information from selected sources via a single website
CN105335246B (en) A kind of program crashing defect self-repairing method based on question and answer web analytics
CN103927400A (en) Web site product detailed information classification crawling and product information base establishing method
CN101441629A (en) Automatic acquiring method of non-structured web page information
CN104462445A (en) Webpage access data processing method and webpage access data processing device
CN103390048B (en) Chained address update method and device
CN106547803A (en) The method and apparatus for crawling website incremental resource
CN103605744B (en) The analysis method and device of site search engine data on flows
CN105630673B (en) A kind of automated testing method and device of web crawlers rate
CN103902725B (en) The acquisition methods of search engine optimization information and device
CN104951476B (en) Method and device for confirming link rank in website

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant