CN110333980A - The test method and device of network crawler system, storage medium, electronic equipment - Google Patents

The test method and device of network crawler system, storage medium, electronic equipment Download PDF

Info

Publication number
CN110333980A
CN110333980A CN201910444805.8A CN201910444805A CN110333980A CN 110333980 A CN110333980 A CN 110333980A CN 201910444805 A CN201910444805 A CN 201910444805A CN 110333980 A CN110333980 A CN 110333980A
Authority
CN
China
Prior art keywords
crawler
task
machine
machines
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910444805.8A
Other languages
Chinese (zh)
Inventor
吕小立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910444805.8A priority Critical patent/CN110333980A/en
Publication of CN110333980A publication Critical patent/CN110333980A/en
Priority to PCT/CN2019/123059 priority patent/WO2020238131A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

This disclosure relates to the test method and device of a kind of network crawler system, belong to testing tool technical field, this method comprises: by obtaining crawler task in system task database, and the crawler task is sent to crawler task sorter when receiving test request signal;When the crawler task sorter is to web crawlers clusters of machines distributed tasks, the net cycle time of each crawler machine in web crawlers clusters of machines is obtained;According to the net cycle time of each crawler machine, with obtain the crawler machine in the web crawlers clusters of machines workload whether Jun Heng judging result.The method increase the testing efficiencies of network crawler system, and test result is accurate.

Description

The test method and device of network crawler system, storage medium, electronic equipment
Technical field
This disclosure relates to testing tool technical field, test method, net in particular to a kind of network crawler system Test device, computer readable storage medium and the electronic equipment of network crawler system.
Background technique
With the rapid development of network, internet becomes the carrier of bulk information, and search engine is as an auxiliary people Retrieve obtain various information tool have become user access internet entrance and guide.
Wherein, the network crawler system as one of the important component of search engine is one and automatically extracts webpage System, network crawler system include crawler task sorter and multiple crawler machines, and crawler task sorter is used for crawler machine Device distributed tasks, crawler machine is after receiving crawler task, from the URL (Uniform of one or several Initial pages Resource Locator, uniform resource locator) start, new URL is constantly extracted from current page is put into queue progress Search, the stop condition until meeting system.Since network crawler system requires the website of crawl flood tide daily, in order to The working efficiency of awareness network crawler system needs to test the performance of network crawler system.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
Embodiment of the disclosure provide a kind of test method of network crawler system, the test device of network crawler system, Computer readable storage medium and electronic equipment.
According to the disclosure in a first aspect, providing a kind of test method of network crawler system, comprising:
When receiving test request signal, by obtaining crawler task in system task database, and by the crawler task It is sent to crawler task sorter;
When the crawler task sorter is to web crawlers clusters of machines distributed tasks, obtain in web crawlers clusters of machines Each crawler machine net cycle time;
According to the net cycle time of each crawler machine, to obtain the work of the crawler machine in the web crawlers clusters of machines Measure whether Jun Heng judging result.
In an exemplary embodiment of the disclosure, each crawler machine obtained in web crawlers clusters of machines Net cycle time includes:
When each crawler machine receives the crawler task distributed by the crawler task sorter, the crawler machine is recorded Device completes the working time that the crawler required by task is wanted;
When the task distribution in the crawler task sorter finishes and all crawler tasks have been completed, based on each Crawler machine completes the working time of each crawler required by task, and the net cycle time of each crawler machine is calculated.
In an exemplary embodiment of the disclosure, the described crawler machine that records completes the work that the crawler required by task is wanted Include: as the time
Start to count when the crawler machine starts and crawls for the first time when the crawler machine receives the crawler task When;
When the crawler machine needle terminates timing to the crawler task completion pre-determined number after crawling, to obtain the crawler Machine is completed the working time of the crawler required by task, and the working time is carried out corresponding storage with the crawler machine.
In an exemplary embodiment of the disclosure, the net cycle time according to each crawler machine, to be somebody's turn to do Whether Jun Heng judging result includes: the workload of crawler machine in web crawlers clusters of machines
The net cycle time of each crawler machine is ranked up according to sequence from small to large, to obtain working time sequence Column;
Based on obtained working time sequence, the last one net cycle time in the working time sequence is subtracted One net cycle time, to obtain time difference;
By the time difference divided by first net cycle time in the working time sequence, to obtain the web crawlers machine The balanced rate of the workload of crawler machine in device cluster;
Based on the equilibrium rate, judge whether the workload of the crawler machine in the web crawlers clusters of machines is balanced.
It is described to be based on the equilibrium rate in an exemplary embodiment of the disclosure, judge in the web crawlers clusters of machines The workload of crawler machine whether equilibrium includes:
When the equilibrium rate is less than or equal to predetermined threshold, the work of the crawler machine in the web crawlers clusters of machines is determined Work amount is balanced;
When the equilibrium rate is greater than predetermined threshold, the workload of the crawler machine in the web crawlers clusters of machines is determined not It is balanced.
In an exemplary embodiment of the disclosure, by also wrapping before obtaining crawler task in system task database It includes:
Obtain multiple uniform resource locator;
Multiple uniform resource locator is sent to the web crawlers clusters of machines, by the web crawlers clusters of machines Crawler machine each uniform resource locator is crawled, record crawl result;
When the quantity for crawling result meets predetermined quantity, using it is all crawl result as the task of crawling be stored in system appoint It is engaged in database.
According to the second aspect of the disclosure, a kind of test device of network crawler system is provided, comprising:
Task acquisition module is configured to when receiving test request signal, by obtaining crawler in system task database Task, and the crawler task is sent to crawler task sorter;
Time recording module is configured to obtain when the crawler task sorter is to web crawlers clusters of machines distributed tasks Take the net cycle time of each crawler machine in web crawlers clusters of machines;
Judgment module is configured to the net cycle time according to each crawler machine, to obtain the web crawlers clusters of machines In crawler machine workload whether Jun Heng judging result.
In an exemplary embodiment of the disclosure, the judgment module includes:
Sequencing unit is configured to for the net cycle time of each crawler machine being ranked up according to sequence from small to large, To obtain time series;
First computing unit is configured to obtained working time sequence, will be last in the working time sequence One net cycle time subtracts first net cycle time, to obtain time difference;
Second computing unit, is configured to the time difference divided by first net cycle time in the time series, with Obtain the balanced rate of the workload of the crawler machine in the web crawlers clusters of machines;
Judging unit is configured to the equilibrium rate, judges the work of the crawler machine in the web crawlers clusters of machines Whether amount is balanced.
According to the third aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The test method of the network crawler system as described in above-mentioned any one is realized when the computer program is executed by processor.
According to the fourth aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Processor;And
Memory is stored thereon with computer program;
Wherein, the processor is configured to being realized as described in above-mentioned any one via the computer program is executed Network crawler system test method.
The technical scheme provided by this disclosed embodiment can have it is following the utility model has the advantages that
By the presently disclosed embodiments, when receiving test request signal, climbed by being obtained in system task database Worm task, and acquired crawler task is sent to crawler task sorter and is distributed, when crawler task sorter is to net When crawler machine distributed tasks in network crawler clusters of machines, obtain total after each crawler machine to all crawler tasks Working time, and according to the net cycle time of each crawler machine, obtain the crawler machine in the web crawlers clusters of machines Workload whether Jun Heng judging result.By calculating each crawler machine in crawler task sorter to web crawlers machine collection Group distributed tasks during net cycle time, with obtain the crawler machine in the web crawlers clusters of machines workload whether Balanced judging result, test process is simple and easy, and the utilization of resources that the network crawler system is represent if balanced is abundant, effect Rate is higher, and the resource that the network crawler system is represent if unbalanced fails to be fully used, and efficiency is lower.User Ke Gen It chooses whether to need to debug network crawler system according to the judging result, improves test of the user to network crawler system Efficiency.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 shows a kind of process signal of the test method of network crawler system according to one exemplary embodiment of the disclosure Schematic diagram.
Fig. 2 shows the steps in the test method according to the network crawler system of Fig. 1 of one exemplary embodiment of the disclosure The flow diagram of S130.
Fig. 3 shows also included in the test method according to a kind of network crawler system of one exemplary embodiment of the disclosure Establish the flow diagram of system task database.
Fig. 4 shows a kind of signal composition of the test device of network crawler system according to one exemplary embodiment of the disclosure Block diagram.
Fig. 5 shows the signal composition block diagram of the electronic equipment according to one exemplary embodiment of the disclosure.
Fig. 6 shows a kind of schematic diagram of computer readable storage medium according to one exemplary embodiment of the disclosure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, known solution is not shown in detail or describes to avoid the disclosure is made Various aspects thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.
Referring to Fig. 1, Fig. 1 is the process according to the test method of the network crawler system of one exemplary embodiment of the disclosure Schematic diagram provides a kind of test method of network crawler system as shown in the embodiment of figure 1, the network crawler system Test method can be run in any calculating equipment, such as run on terminal or server, can also run on server Cluster or Cloud Server etc., certainly, those skilled in the art can also run method of the invention in other platforms according to demand, The disclosure does not do particular determination to this, as shown in Figure 1, the test method of the network crawler system includes:
Step S110, when receiving test request signal, by obtaining crawler task in system task database, and should Crawler task is sent to crawler task sorter.
Wherein, network crawler system refers to the system that web message is automatically grabbed according to pre-defined rule, the web crawlers System includes crawler task sorter and web crawlers clusters of machines, and crawler task sorter is used for web crawlers clusters of machines Distribute crawler task, web crawlers clusters of machines includes multiple crawler machines, is appointed when web crawlers clusters of machines receives crawler When the crawler task that business sorter is distributed, the crawler task is crawled by crawler machine needle.
Test request signal refers to the signal for requesting to start test, and in one example, test request signal can be with It is to be clicked the specific region at interface by user and sent, such as user clicks test request key etc..In another example In, test request signal can be to be sent at predetermined time intervals, which can be 8 hours, 12 hours or 24 Hour etc., this example does not do particular determination to this, for example, the test request signal can be configured as at daily 18 into Row is sent to request to start test etc..
System task database refers to the database of the crawler task for storing test network crawler system, when receiving When test request signal, crawler task point is sent to by obtaining crawler task in system task database, and by the crawler task Hair machine distributes crawler task from crawler task sorter to web crawlers clusters of machines.Wherein the quantity of crawler task is multiple, Those skilled in the art can be configured according to actual needs, such as available 1000 crawler tasks, 2000 crawlers are appointed Business or 5000 crawler task dispatchings etc., this example does not do particular determination to this.
Referring to FIG. 3, Fig. 3 is in the test method according to a kind of network crawler system of one exemplary embodiment of the disclosure The also included flow diagram for establishing system task database, before by acquisition crawler task in system task database, The test method of the network crawler system further include:
Step S310 obtains multiple uniform resource locator.
Wherein, uniform resource locator (Uniform Resource Locator, URL) is standard resource on internet Address.Crawler machine is since the URL of one or several Initial pages, constantly from current page when carrying out crawling task New URL is extracted on face be put into queue scan for, the stop condition until meeting system.It in one example, can be by interconnecting Random search is carried out in net to obtain the uniform resource locator.
Multiple uniform resource locator is sent to the web crawlers clusters of machines, by the web crawlers by step S320 Crawler machine in clusters of machines crawls each uniform resource locator, and record crawls result.
Wherein, multiple uniform resource locator are sent to web crawlers clusters of machines, by the web crawlers clusters of machines In crawler machine each uniform resource locator is crawled, and to record crawler machine crawling as a result, with obtains enough The URL of enough amounts is stored as crawler task.
Step S330 crawls result as the task of crawling and deposits when the quantity for crawling result meets predetermined quantity using all It is stored in system task database.
Wherein, predetermined quantity is to be pre-configured with, such as the predetermined quantity can be 1000,2000 or 5000 etc. Deng.When crawling result and meeting predetermined quantity, stop crawling, and the result that crawls recorded is stored in as crawler task and is In system assignment database, for the test after progress.
Step S120 obtains web crawlers when the crawler task sorter is to web crawlers clusters of machines distributed tasks The net cycle time of each crawler machine in clusters of machines.
Wherein, crawler machine from from crawler task sorter to web crawlers clusters of machines distributes crawler task, when climbing When worm machine completes current crawler task, crawler task sorter continues to distribute next crawler task to the crawler machine.Record The working time of each crawler required by task of completion of each crawler machine, and each crawler machine is completed into crawler task respectively The required working time is added, to obtain the net cycle time that each crawler machine completes crawler task.
In one exemplary embodiment, when the total working for obtaining each crawler machine in web crawlers clusters of machines Between include:
When each crawler machine receives the crawler task distributed by the crawler task sorter, the crawler machine is recorded Device completes the working time that the crawler required by task is wanted.
Wherein, when each crawler machine receives the crawler task distributed by crawler task sorter, with the crawler machine It is starting point at the time of device starts to crawl, is terminal at the time of stopping crawls, records the crawler machine and complete the crawler required by task The working time wanted.Such as crawler machine starts to crawl at the 15:30 moment, stops crawling at the 15:35 moment, completes crawler and appoints Business, then the crawler machine completes the working time that the crawler required by task is wanted as 5min.
In one exemplary embodiment, the described crawler machine that records completes the working time packet that the crawler required by task is wanted It includes:
Start to count when the crawler machine starts and crawls for the first time when the crawler machine receives the crawler task When;
When the crawler machine needle terminates timing to the crawler task completion pre-determined number after crawling, to obtain the crawler Machine is completed the working time of the crawler required by task, and the working time carries out corresponding storage with the crawler machine.
In this embodiment, when obtaining the crawler machine by way of timing and completing the work that the crawler required by task is wanted Between, so that the acquired working time is more intuitive, without carrying out extra calculating, reduce unnecessary power loss.
When the task distribution in the crawler task sorter finishes and all crawler tasks have been completed, based on each Crawler machine completes the working time of each crawler required by task, and the net cycle time of each crawler machine is calculated.
Wherein, each crawler machine the working time that each crawler required by task is wanted is completed respectively to be added, with Three crawler tasks are completed to the net cycle time of the crawler machine, such as crawler machine, complete three crawler tasks Working time is respectively 70S, 98,82S, then the net cycle time of the crawler machine is 250S.
Step S130, according to the net cycle time of each crawler machine, to obtain climbing in the web crawlers clusters of machines The workload of worm machine whether Jun Heng judging result.
Wherein, the net cycle time of crawler machine the long, and the workload that represent the crawler machine is bigger.According to each The net cycle time of crawler machine can be derived that the workload situation of the crawler machine in the web crawlers clusters of machines.According to The workload situation of each crawler machine can determine whether the workload of the crawler machine in the web crawlers clusters of machines is equal Weighing apparatus has represent crawler machine and has been in idle state for a long time if unbalanced, i.e. the calling of crawler machine is unreasonable, reduces The working efficiency of web crawlers clusters of machines.User can according to the judgment result debug network crawler system, with Reach and the performance of the network crawler system is sufficiently used, improves crawler efficiency.
Referring to FIG. 2, Fig. 2 is the test method according to the network crawler system of Fig. 1 of one exemplary embodiment of the disclosure In step S130 flow diagram, in the embodiment shown in Figure 2, when the total working according to each crawler machine Between, with the workload that obtains the crawler machine in the web crawlers clusters of machines, whether Jun Heng judging result includes:
The net cycle time of each crawler machine is ranked up, to obtain by step S210 according to sequence from small to large Working time sequence;
Step S220 is based on obtained working time sequence, by the last one total working in the working time sequence Time subtracts first net cycle time, to obtain time difference;
Step S230, by the time difference divided by first net cycle time in the time series, to obtain the network The balanced rate of the workload of crawler machine in crawler clusters of machines;
Step S240, be based on the equilibrium rate, judge the crawler machine in the web crawlers clusters of machines workload whether It is balanced.
In the present example embodiment, the net cycle time of each crawler machine is arranged according to sequence from small to large It include 4 crawler machines in sequence, such as web crawlers clusters of machines, the corresponding net cycle time of 4 crawler machines is The net cycle time of each crawler machine is ranked up according to sequence from small to large, obtains by 125S, 113S, 98S and 136S Working time sequence be (98,113,125,136).Based on obtained working time sequence, the working time will be arranged in The working time of last in sequence subtract be arranged in the primary working time in the working time sequence will the work Maximum value in time series subtracts minimum value, to obtain time difference.Such as working time sequence be (98,113,125, 136), then the time difference of the working time sequence is 136-98=38.
Obtained time difference will be calculated divided by being arranged in primary net cycle time in the working time sequence, with It obtains the time difference and accounts for the ratio for being arranged in primary net cycle time in the working time sequence, which is the net The balanced rate of the workload of crawler machine in network crawler clusters of machines.Such as 4 crawler machines of web crawlers clusters of machines Working time sequence be (98,113,125,136), the time difference of the working time sequence is 136-98=38, when by this Between difference divided by primary net cycle time in the working time sequence is arranged in, obtain climbing for the web crawlers clusters of machines The balanced rate of the workload of worm machine is 38/98 ≈ 38.78%.
According to the equilibrium rate, it can intuitively show that the longest crawler machine of net cycle time is more shortest than net cycle time Relationship between the workload of the shortest crawler machine of the workload and net cycle time that crawler machine is had more.Balanced rate is got over Greatly, then the workload that represent the longest crawler machine of net cycle time is compared to the shortest crawler machine of net cycle time The workload of the more crawler machines i.e. in the web crawlers clusters of machines of workload is unbalanced, conversely, balanced rate is smaller, then generation Table the workload of the longest crawler machine of net cycle time be compared to the workload of the shortest crawler machine of net cycle time Smaller, i.e., the workload of the crawler machine in the web crawlers clusters of machines is balanced.
In one exemplary embodiment, described to be based on the equilibrium rate, judge the crawler machine in the web crawlers clusters of machines Whether equilibrium includes: the workload of device
When the equilibrium rate is less than or equal to predetermined threshold, the work of the crawler machine in the web crawlers clusters of machines is determined Work amount is balanced;
When the equilibrium rate is greater than predetermined threshold, the workload of the crawler machine in the web crawlers clusters of machines is determined not It is balanced.
Wherein, predetermined threshold is to be pre-configured with, which can be 10%, 20% or 25% etc., this example Particular determination is not done to this.In one example, obtaining the predetermined threshold can be acquired by user equipment, for example, mobile phone or Computer etc., the user equipment to user show it is specific obtain interface, triggered by user obtain specific function on interface into Row obtains, such as user clicks " predetermined threshold input " button obtained on interface, obtains on interface and input frame, Yong Hutong occurs It crosses input equipment such as keyboard or touching display screen and inputs predetermined threshold in input frame.
The embodiment of the present disclosure additionally provides a kind of test device of network crawler system.Refering to what is shown in Fig. 4, the exemplary web The test device of network crawler system may include task acquisition module 410, time recording module 420 and judgment module 430.Its In:
Task acquisition module 410 is configured as: when receiving test request signal, by obtaining in system task database Crawler task, and the crawler task is sent to crawler task sorter;
Time recording module 420 is configured as: when the crawler task sorter is to web crawlers clusters of machines distributed tasks When, obtain the net cycle time of each crawler machine in web crawlers clusters of machines;
Judgment module 430 is configured as: according to the net cycle time of each crawler machine, to obtain the web crawlers machine The workload of crawler machine in cluster whether Jun Heng judging result.
In an illustrative embodiments, the judgment module 430 further includes sequencing unit 431, the first computing unit 432, the second computing unit 433 and judging unit 434, in which:
Sequencing unit 431 is used to for the net cycle time of each crawler machine being ranked up according to sequence from small to large, To obtain time series;
First computing unit 432 is used to be based on obtained working time sequence, will be last in the working time sequence One net cycle time subtracts first net cycle time, to obtain time difference;
Second computing unit 433 is used for the time difference divided by first net cycle time in the time series, with Obtain the balanced rate of the workload of the crawler machine in the web crawlers clusters of machines;
Judging unit 434 is used to be based on the equilibrium rate, judges the work of the crawler machine in the web crawlers clusters of machines Whether amount is balanced.
The detail of each module is in corresponding network crawler system in the test device of above-mentioned network crawler system Test method in be described in detail, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/ Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is executed according to disclosure embodiment Method.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".
According to an exemplary embodiment, which can be implemented as a kind of electronic equipment, which includes storage Device and processor, computer program is stored in the memory, and the computer program makes when being executed by the processor It obtains the processor and executes any of each method embodiment as described above, alternatively, the computer program is described The function that processor makes the electronic equipment realize that component units/module of each embodiment of device as described above is realized when executing Energy.
Processor described in above embodiment can refer to single processing unit, such as central processing unit CPU, can also Be include multiple dispersions processing unit distributed processor system.
Memory described in above embodiment may include one or more memories, can be and calculates equipment Internal storage, such as transient state or non-transient various memories, are also possible to be connected to calculating equipment by memory interface External memory.
The electronic equipment 500 of this embodiment according to the present invention is described referring to Fig. 5.The electronics that Fig. 5 is shown Equipment 500 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, electronic equipment 500 is showed in the form of universal computing device.The component of electronic equipment 500 can wrap It includes but is not limited to: at least one above-mentioned processing unit 510, at least one above-mentioned storage unit 520, the different system components of connection The bus 530 of (including storage unit 520 and processing unit 510).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 510 Row, so that various according to the present invention described in the execution of the processing unit 510 above-mentioned " illustrative methods " part of this specification The step of illustrative embodiments.For example, the processing unit 510 can execute step S110 as shown in fig. 1, work as reception When to test request signal, by obtaining crawler task in system task database, and the crawler task is sent to crawler task Sorter;Step S120 obtains web crawlers machine when the crawler task sorter is to web crawlers clusters of machines distributed tasks The net cycle time of each crawler machine in device cluster;Step S130, according to the net cycle time of each crawler machine, to obtain To the crawler machine in the web crawlers clusters of machines workload whether Jun Heng judging result.
Storage unit 520 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit (RAM) 5201 and/or cache memory unit 5202, it can further include read-only memory unit (ROM) 5203.
Storage unit 520 can also include program/utility with one group of (at least one) program module 5205 5204, such program module 5205 includes but is not limited to: operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 530 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 500 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 500 communicate, and/or with make Any equipment (such as the router, modulation /demodulation that the electronic equipment 500 can be communicated with one or more of the other calculating equipment Device etc.) communication.This communication can be carried out by input/output (I/O) interface 550.Also, electronic equipment 500 can be with By network adapter 560 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, Such as internet) communication.As shown, network adapter 560 is communicated by bus 530 with other modules of electronic equipment 500. It should be understood that although not shown in the drawings, other hardware and/or software module can not used in conjunction with electronic equipment 500, including but not Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this hair The step of bright various illustrative embodiments.
Refering to what is shown in Fig. 6, describing the program product for realizing the above method of embodiment according to the present invention 600, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as it is run on PC.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing can be with To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its The program of combined use.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have Line, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., further include conventional mistake Formula programming language-such as " C " language or similar programming language.Program code can be calculated fully in user It executes in equipment, partly execute on a user device, executing, as an independent software package partially in user calculating equipment Upper part executes on a remote computing or executes in remote computing device or server completely.It is being related to remotely counting In the situation for calculating equipment, remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP To be connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure His embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim It points out.

Claims (10)

1. a kind of test method of network crawler system characterized by comprising
When receiving test request signal, sent by obtaining crawler task in system task database, and by the crawler task To crawler task sorter;
When the crawler task sorter is to web crawlers clusters of machines distributed tasks, obtain every in web crawlers clusters of machines The net cycle time of one crawler machine;
According to the net cycle time of each crawler machine, to obtain the workload of the crawler machine in the web crawlers clusters of machines Whether Jun Heng judging result.
2. the test method of network crawler system according to claim 1, which is characterized in that the acquisition web crawlers machine The net cycle time of each crawler machine in device cluster includes:
When each crawler machine receives the crawler task distributed by the crawler task sorter, it is complete to record the crawler machine The working time wanted at the crawler required by task;
When the task distribution in the crawler task sorter finishes and all crawler tasks have been completed, it is based on each crawler Machine completes the working time of each crawler required by task, and the net cycle time of each crawler machine is calculated.
3. the test method of network crawler system according to claim 2, which is characterized in that described to record the crawler machine Completing the working time that the crawler required by task is wanted includes:
Start timing when the crawler machine starts and crawls for the first time when the crawler machine receives the crawler task;
When the crawler machine needle terminates timing to the crawler task completion pre-determined number after crawling, to obtain the crawler machine It completes the working time of the crawler required by task, and the working time is subjected to corresponding storage with crawler machine.
4. the test method of network crawler system according to claim 1, which is characterized in that described according to each crawler machine The net cycle time of device, with obtain the crawler machine in the web crawlers clusters of machines workload whether Jun Heng judging result Include:
The net cycle time of each crawler machine is ranked up according to sequence from small to large, to obtain working time sequence;
Based on obtained working time sequence, the last one net cycle time in the working time sequence is subtracted first Net cycle time, to obtain time difference;
By the time difference divided by first net cycle time in the working time sequence, to obtain the web crawlers machine collection The balanced rate of the workload of crawler machine in group;
Based on the equilibrium rate, judge whether the workload of the crawler machine in the web crawlers clusters of machines is balanced.
5. the test method of network crawler system according to claim 4, which is characterized in that it is described to be based on the equilibrium rate, Judge whether equilibrium includes: for the workload of the crawler machine in the web crawlers clusters of machines
When the equilibrium rate is less than or equal to predetermined threshold, the workload of the crawler machine in the web crawlers clusters of machines is determined It is balanced;
When the equilibrium rate is greater than predetermined threshold, determine that the workload of the crawler machine in the web crawlers clusters of machines is uneven Weighing apparatus.
6. the test method of network crawler system according to claim 1, which is characterized in that by system task database Before middle acquisition crawler task, further includes:
Obtain multiple uniform resource locator;
Multiple uniform resource locator is sent to the web crawlers clusters of machines, by climbing in the web crawlers clusters of machines Worm machine crawls each uniform resource locator, and record crawls result;
When the quantity for crawling result meets predetermined quantity, result is crawled as the task of crawling be stored in all system task number According in library.
7. a kind of test device of network crawler system characterized by comprising
Task acquisition module is configured to when receiving test request signal, by obtaining crawler task in system task database, And the crawler task is sent to crawler task sorter;
Time recording module is configured to obtain net when the crawler task sorter is to web crawlers clusters of machines distributed tasks The net cycle time of each crawler machine in network crawler clusters of machines;
Judgment module is configured to the net cycle time according to each crawler machine, to obtain in the web crawlers clusters of machines The workload of crawler machine whether Jun Heng judging result.
8. the test device of network crawler system according to claim 7, which is characterized in that the judgment module includes:
Sequencing unit, for the net cycle time of each crawler machine to be ranked up according to sequence from small to large, to obtain Time series;
First computing unit, it is for being based on obtained working time sequence, the last one in the working time sequence is total Working time subtracts first net cycle time, to obtain time difference;
Second computing unit, for by the time difference divided by first net cycle time in the time series, to be somebody's turn to do The balanced rate of the workload of crawler machine in web crawlers clusters of machines;
Judging unit, for be based on the equilibrium rate, judge the crawler machine in the web crawlers clusters of machines workload whether It is balanced.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The test method of the network crawler system as described in any one of claim 1-6 is realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;And
Memory is stored thereon with computer program;
Wherein, the processor is configured to realizing via the computer program is executed as any one in claim 1-6 The test method of network crawler system described in.
CN201910444805.8A 2019-05-24 2019-05-24 The test method and device of network crawler system, storage medium, electronic equipment Pending CN110333980A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910444805.8A CN110333980A (en) 2019-05-24 2019-05-24 The test method and device of network crawler system, storage medium, electronic equipment
PCT/CN2019/123059 WO2020238131A1 (en) 2019-05-24 2019-12-04 Web crawler system testing method and apparatus, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910444805.8A CN110333980A (en) 2019-05-24 2019-05-24 The test method and device of network crawler system, storage medium, electronic equipment

Publications (1)

Publication Number Publication Date
CN110333980A true CN110333980A (en) 2019-10-15

Family

ID=68140378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910444805.8A Pending CN110333980A (en) 2019-05-24 2019-05-24 The test method and device of network crawler system, storage medium, electronic equipment

Country Status (2)

Country Link
CN (1) CN110333980A (en)
WO (1) WO2020238131A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020238131A1 (en) * 2019-05-24 2020-12-03 深圳壹账通智能科技有限公司 Web crawler system testing method and apparatus, storage medium, and electronic device
CN115328812A (en) * 2022-10-11 2022-11-11 深圳华锐分布式技术股份有限公司 UI (user interface) testing method, device, equipment and medium based on web crawler

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040225644A1 (en) * 2003-05-09 2004-11-11 International Business Machines Corporation Method and apparatus for search engine World Wide Web crawling
CN106202108A (en) * 2015-05-06 2016-12-07 阿里巴巴集团控股有限公司 Web crawlers captures method for allocating tasks and device and data grab method and device
CN106648445A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Data storage method and apparatus used for crawler
CN107071009A (en) * 2017-03-28 2017-08-18 江苏飞搏软件股份有限公司 A kind of distributed big data crawler system of load balancing
CN107203623A (en) * 2017-05-26 2017-09-26 山东省科学院情报研究所 The load balancing adjusting method of network crawler system
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN108205541A (en) * 2016-12-16 2018-06-26 北大方正集团有限公司 The dispatching method and device of distributed network reptile task

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9253154B2 (en) * 2008-08-12 2016-02-02 Mcafee, Inc. Configuration management for a capture/registration system
CN110333980A (en) * 2019-05-24 2019-10-15 深圳壹账通智能科技有限公司 The test method and device of network crawler system, storage medium, electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040225644A1 (en) * 2003-05-09 2004-11-11 International Business Machines Corporation Method and apparatus for search engine World Wide Web crawling
CN106202108A (en) * 2015-05-06 2016-12-07 阿里巴巴集团控股有限公司 Web crawlers captures method for allocating tasks and device and data grab method and device
CN106648445A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Data storage method and apparatus used for crawler
CN108205541A (en) * 2016-12-16 2018-06-26 北大方正集团有限公司 The dispatching method and device of distributed network reptile task
CN107071009A (en) * 2017-03-28 2017-08-18 江苏飞搏软件股份有限公司 A kind of distributed big data crawler system of load balancing
CN107203623A (en) * 2017-05-26 2017-09-26 山东省科学院情报研究所 The load balancing adjusting method of network crawler system
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020238131A1 (en) * 2019-05-24 2020-12-03 深圳壹账通智能科技有限公司 Web crawler system testing method and apparatus, storage medium, and electronic device
CN115328812A (en) * 2022-10-11 2022-11-11 深圳华锐分布式技术股份有限公司 UI (user interface) testing method, device, equipment and medium based on web crawler

Also Published As

Publication number Publication date
WO2020238131A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
CN110083455B (en) Graph calculation processing method, graph calculation processing device, graph calculation processing medium and electronic equipment
CN106649084B (en) The acquisition methods and device of function call information, test equipment
CN103294485B (en) Web service method for packing and system for ABINIT concurrent computational system
CN110716783A (en) Front-end page generation and deployment method and device, storage medium and equipment
EP3011442A1 (en) Method and apparatus for customized software development kit (sdk) generation
CN111045653B (en) System generation method and device, computer readable medium and electronic equipment
CN109446038A (en) The statistical method and terminal device of page access duration
CN110007819A (en) The operation indicating method, apparatus and computer readable storage medium of system
CN110471585A (en) Function of application icon methods of exhibiting, device and computer equipment
CN110333980A (en) The test method and device of network crawler system, storage medium, electronic equipment
CN112988185A (en) Cloud application updating method, device and system, electronic equipment and storage medium
CN110781180A (en) Data screening method and data screening device
CN112306471A (en) Task scheduling method and device
CN111427577A (en) Code processing method and device and server
CN111352951A (en) Data export method, device and system
CN107066536A (en) Comment determines method and device
CN116974874A (en) Database testing method and device, electronic equipment and readable storage medium
CN105426183B (en) A kind of form validation method
CN108959294A (en) A kind of method and apparatus accessing search engine
CN116225690A (en) Memory multidimensional database calculation load balancing method and system based on docker
CN112463574A (en) Software testing method, device, system, equipment and storage medium
CN111666201A (en) Regression testing method, device, medium and electronic equipment
CN109376048A (en) A kind of test method and equipment of touch screen
CN112162963A (en) Data synchronization method and device, computer equipment and storage medium
CN107526827A (en) Method, equipment and computer-readable recording medium for question and answer displaying

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 201, room 518000, building A, No. 1, front Bay Road, Qianhai Shenzhen Guangdong Shenzhen Hong Kong cooperation zone (Qianhai business secretary)

Applicant after: ONECONNECT FINANCIAL TECHNOLOGY Co.,Ltd. (SHANGHAI)

Address before: 518000 Guangdong city of Shenzhen province Qianhai Shenzhen Hong Kong cooperation zone before Bay Road No. 1 building 201 room A

Applicant before: ONECONNECT FINANCIAL TECHNOLOGY Co.,Ltd. (SHANGHAI)

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191015

WD01 Invention patent application deemed withdrawn after publication