CN110333980A - The test method and device of network crawler system, storage medium, electronic equipment - Google Patents
The test method and device of network crawler system, storage medium, electronic equipment Download PDFInfo
- Publication number
- CN110333980A CN110333980A CN201910444805.8A CN201910444805A CN110333980A CN 110333980 A CN110333980 A CN 110333980A CN 201910444805 A CN201910444805 A CN 201910444805A CN 110333980 A CN110333980 A CN 110333980A
- Authority
- CN
- China
- Prior art keywords
- crawler
- task
- machine
- machines
- clusters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
This disclosure relates to the test method and device of a kind of network crawler system, belong to testing tool technical field, this method comprises: by obtaining crawler task in system task database, and the crawler task is sent to crawler task sorter when receiving test request signal;When the crawler task sorter is to web crawlers clusters of machines distributed tasks, the net cycle time of each crawler machine in web crawlers clusters of machines is obtained;According to the net cycle time of each crawler machine, with obtain the crawler machine in the web crawlers clusters of machines workload whether Jun Heng judging result.The method increase the testing efficiencies of network crawler system, and test result is accurate.
Description
Technical field
This disclosure relates to testing tool technical field, test method, net in particular to a kind of network crawler system
Test device, computer readable storage medium and the electronic equipment of network crawler system.
Background technique
With the rapid development of network, internet becomes the carrier of bulk information, and search engine is as an auxiliary people
Retrieve obtain various information tool have become user access internet entrance and guide.
Wherein, the network crawler system as one of the important component of search engine is one and automatically extracts webpage
System, network crawler system include crawler task sorter and multiple crawler machines, and crawler task sorter is used for crawler machine
Device distributed tasks, crawler machine is after receiving crawler task, from the URL (Uniform of one or several Initial pages
Resource Locator, uniform resource locator) start, new URL is constantly extracted from current page is put into queue progress
Search, the stop condition until meeting system.Since network crawler system requires the website of crawl flood tide daily, in order to
The working efficiency of awareness network crawler system needs to test the performance of network crawler system.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part
Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
Embodiment of the disclosure provide a kind of test method of network crawler system, the test device of network crawler system,
Computer readable storage medium and electronic equipment.
According to the disclosure in a first aspect, providing a kind of test method of network crawler system, comprising:
When receiving test request signal, by obtaining crawler task in system task database, and by the crawler task
It is sent to crawler task sorter;
When the crawler task sorter is to web crawlers clusters of machines distributed tasks, obtain in web crawlers clusters of machines
Each crawler machine net cycle time;
According to the net cycle time of each crawler machine, to obtain the work of the crawler machine in the web crawlers clusters of machines
Measure whether Jun Heng judging result.
In an exemplary embodiment of the disclosure, each crawler machine obtained in web crawlers clusters of machines
Net cycle time includes:
When each crawler machine receives the crawler task distributed by the crawler task sorter, the crawler machine is recorded
Device completes the working time that the crawler required by task is wanted;
When the task distribution in the crawler task sorter finishes and all crawler tasks have been completed, based on each
Crawler machine completes the working time of each crawler required by task, and the net cycle time of each crawler machine is calculated.
In an exemplary embodiment of the disclosure, the described crawler machine that records completes the work that the crawler required by task is wanted
Include: as the time
Start to count when the crawler machine starts and crawls for the first time when the crawler machine receives the crawler task
When;
When the crawler machine needle terminates timing to the crawler task completion pre-determined number after crawling, to obtain the crawler
Machine is completed the working time of the crawler required by task, and the working time is carried out corresponding storage with the crawler machine.
In an exemplary embodiment of the disclosure, the net cycle time according to each crawler machine, to be somebody's turn to do
Whether Jun Heng judging result includes: the workload of crawler machine in web crawlers clusters of machines
The net cycle time of each crawler machine is ranked up according to sequence from small to large, to obtain working time sequence
Column;
Based on obtained working time sequence, the last one net cycle time in the working time sequence is subtracted
One net cycle time, to obtain time difference;
By the time difference divided by first net cycle time in the working time sequence, to obtain the web crawlers machine
The balanced rate of the workload of crawler machine in device cluster;
Based on the equilibrium rate, judge whether the workload of the crawler machine in the web crawlers clusters of machines is balanced.
It is described to be based on the equilibrium rate in an exemplary embodiment of the disclosure, judge in the web crawlers clusters of machines
The workload of crawler machine whether equilibrium includes:
When the equilibrium rate is less than or equal to predetermined threshold, the work of the crawler machine in the web crawlers clusters of machines is determined
Work amount is balanced;
When the equilibrium rate is greater than predetermined threshold, the workload of the crawler machine in the web crawlers clusters of machines is determined not
It is balanced.
In an exemplary embodiment of the disclosure, by also wrapping before obtaining crawler task in system task database
It includes:
Obtain multiple uniform resource locator;
Multiple uniform resource locator is sent to the web crawlers clusters of machines, by the web crawlers clusters of machines
Crawler machine each uniform resource locator is crawled, record crawl result;
When the quantity for crawling result meets predetermined quantity, using it is all crawl result as the task of crawling be stored in system appoint
It is engaged in database.
According to the second aspect of the disclosure, a kind of test device of network crawler system is provided, comprising:
Task acquisition module is configured to when receiving test request signal, by obtaining crawler in system task database
Task, and the crawler task is sent to crawler task sorter;
Time recording module is configured to obtain when the crawler task sorter is to web crawlers clusters of machines distributed tasks
Take the net cycle time of each crawler machine in web crawlers clusters of machines;
Judgment module is configured to the net cycle time according to each crawler machine, to obtain the web crawlers clusters of machines
In crawler machine workload whether Jun Heng judging result.
In an exemplary embodiment of the disclosure, the judgment module includes:
Sequencing unit is configured to for the net cycle time of each crawler machine being ranked up according to sequence from small to large,
To obtain time series;
First computing unit is configured to obtained working time sequence, will be last in the working time sequence
One net cycle time subtracts first net cycle time, to obtain time difference;
Second computing unit, is configured to the time difference divided by first net cycle time in the time series, with
Obtain the balanced rate of the workload of the crawler machine in the web crawlers clusters of machines;
Judging unit is configured to the equilibrium rate, judges the work of the crawler machine in the web crawlers clusters of machines
Whether amount is balanced.
According to the third aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with,
The test method of the network crawler system as described in above-mentioned any one is realized when the computer program is executed by processor.
According to the fourth aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Processor;And
Memory is stored thereon with computer program;
Wherein, the processor is configured to being realized as described in above-mentioned any one via the computer program is executed
Network crawler system test method.
The technical scheme provided by this disclosed embodiment can have it is following the utility model has the advantages that
By the presently disclosed embodiments, when receiving test request signal, climbed by being obtained in system task database
Worm task, and acquired crawler task is sent to crawler task sorter and is distributed, when crawler task sorter is to net
When crawler machine distributed tasks in network crawler clusters of machines, obtain total after each crawler machine to all crawler tasks
Working time, and according to the net cycle time of each crawler machine, obtain the crawler machine in the web crawlers clusters of machines
Workload whether Jun Heng judging result.By calculating each crawler machine in crawler task sorter to web crawlers machine collection
Group distributed tasks during net cycle time, with obtain the crawler machine in the web crawlers clusters of machines workload whether
Balanced judging result, test process is simple and easy, and the utilization of resources that the network crawler system is represent if balanced is abundant, effect
Rate is higher, and the resource that the network crawler system is represent if unbalanced fails to be fully used, and efficiency is lower.User Ke Gen
It chooses whether to need to debug network crawler system according to the judging result, improves test of the user to network crawler system
Efficiency.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 shows a kind of process signal of the test method of network crawler system according to one exemplary embodiment of the disclosure
Schematic diagram.
Fig. 2 shows the steps in the test method according to the network crawler system of Fig. 1 of one exemplary embodiment of the disclosure
The flow diagram of S130.
Fig. 3 shows also included in the test method according to a kind of network crawler system of one exemplary embodiment of the disclosure
Establish the flow diagram of system task database.
Fig. 4 shows a kind of signal composition of the test device of network crawler system according to one exemplary embodiment of the disclosure
Block diagram.
Fig. 5 shows the signal composition block diagram of the electronic equipment according to one exemplary embodiment of the disclosure.
Fig. 6 shows a kind of schematic diagram of computer readable storage medium according to one exemplary embodiment of the disclosure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps
More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can
It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used
Method, constituent element, device, step etc..In other cases, known solution is not shown in detail or describes to avoid the disclosure is made
Various aspects thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note indicates same or similar part, thus will omit repetition thereof.
Referring to Fig. 1, Fig. 1 is the process according to the test method of the network crawler system of one exemplary embodiment of the disclosure
Schematic diagram provides a kind of test method of network crawler system as shown in the embodiment of figure 1, the network crawler system
Test method can be run in any calculating equipment, such as run on terminal or server, can also run on server
Cluster or Cloud Server etc., certainly, those skilled in the art can also run method of the invention in other platforms according to demand,
The disclosure does not do particular determination to this, as shown in Figure 1, the test method of the network crawler system includes:
Step S110, when receiving test request signal, by obtaining crawler task in system task database, and should
Crawler task is sent to crawler task sorter.
Wherein, network crawler system refers to the system that web message is automatically grabbed according to pre-defined rule, the web crawlers
System includes crawler task sorter and web crawlers clusters of machines, and crawler task sorter is used for web crawlers clusters of machines
Distribute crawler task, web crawlers clusters of machines includes multiple crawler machines, is appointed when web crawlers clusters of machines receives crawler
When the crawler task that business sorter is distributed, the crawler task is crawled by crawler machine needle.
Test request signal refers to the signal for requesting to start test, and in one example, test request signal can be with
It is to be clicked the specific region at interface by user and sent, such as user clicks test request key etc..In another example
In, test request signal can be to be sent at predetermined time intervals, which can be 8 hours, 12 hours or 24
Hour etc., this example does not do particular determination to this, for example, the test request signal can be configured as at daily 18 into
Row is sent to request to start test etc..
System task database refers to the database of the crawler task for storing test network crawler system, when receiving
When test request signal, crawler task point is sent to by obtaining crawler task in system task database, and by the crawler task
Hair machine distributes crawler task from crawler task sorter to web crawlers clusters of machines.Wherein the quantity of crawler task is multiple,
Those skilled in the art can be configured according to actual needs, such as available 1000 crawler tasks, 2000 crawlers are appointed
Business or 5000 crawler task dispatchings etc., this example does not do particular determination to this.
Referring to FIG. 3, Fig. 3 is in the test method according to a kind of network crawler system of one exemplary embodiment of the disclosure
The also included flow diagram for establishing system task database, before by acquisition crawler task in system task database,
The test method of the network crawler system further include:
Step S310 obtains multiple uniform resource locator.
Wherein, uniform resource locator (Uniform Resource Locator, URL) is standard resource on internet
Address.Crawler machine is since the URL of one or several Initial pages, constantly from current page when carrying out crawling task
New URL is extracted on face be put into queue scan for, the stop condition until meeting system.It in one example, can be by interconnecting
Random search is carried out in net to obtain the uniform resource locator.
Multiple uniform resource locator is sent to the web crawlers clusters of machines, by the web crawlers by step S320
Crawler machine in clusters of machines crawls each uniform resource locator, and record crawls result.
Wherein, multiple uniform resource locator are sent to web crawlers clusters of machines, by the web crawlers clusters of machines
In crawler machine each uniform resource locator is crawled, and to record crawler machine crawling as a result, with obtains enough
The URL of enough amounts is stored as crawler task.
Step S330 crawls result as the task of crawling and deposits when the quantity for crawling result meets predetermined quantity using all
It is stored in system task database.
Wherein, predetermined quantity is to be pre-configured with, such as the predetermined quantity can be 1000,2000 or 5000 etc.
Deng.When crawling result and meeting predetermined quantity, stop crawling, and the result that crawls recorded is stored in as crawler task and is
In system assignment database, for the test after progress.
Step S120 obtains web crawlers when the crawler task sorter is to web crawlers clusters of machines distributed tasks
The net cycle time of each crawler machine in clusters of machines.
Wherein, crawler machine from from crawler task sorter to web crawlers clusters of machines distributes crawler task, when climbing
When worm machine completes current crawler task, crawler task sorter continues to distribute next crawler task to the crawler machine.Record
The working time of each crawler required by task of completion of each crawler machine, and each crawler machine is completed into crawler task respectively
The required working time is added, to obtain the net cycle time that each crawler machine completes crawler task.
In one exemplary embodiment, when the total working for obtaining each crawler machine in web crawlers clusters of machines
Between include:
When each crawler machine receives the crawler task distributed by the crawler task sorter, the crawler machine is recorded
Device completes the working time that the crawler required by task is wanted.
Wherein, when each crawler machine receives the crawler task distributed by crawler task sorter, with the crawler machine
It is starting point at the time of device starts to crawl, is terminal at the time of stopping crawls, records the crawler machine and complete the crawler required by task
The working time wanted.Such as crawler machine starts to crawl at the 15:30 moment, stops crawling at the 15:35 moment, completes crawler and appoints
Business, then the crawler machine completes the working time that the crawler required by task is wanted as 5min.
In one exemplary embodiment, the described crawler machine that records completes the working time packet that the crawler required by task is wanted
It includes:
Start to count when the crawler machine starts and crawls for the first time when the crawler machine receives the crawler task
When;
When the crawler machine needle terminates timing to the crawler task completion pre-determined number after crawling, to obtain the crawler
Machine is completed the working time of the crawler required by task, and the working time carries out corresponding storage with the crawler machine.
In this embodiment, when obtaining the crawler machine by way of timing and completing the work that the crawler required by task is wanted
Between, so that the acquired working time is more intuitive, without carrying out extra calculating, reduce unnecessary power loss.
When the task distribution in the crawler task sorter finishes and all crawler tasks have been completed, based on each
Crawler machine completes the working time of each crawler required by task, and the net cycle time of each crawler machine is calculated.
Wherein, each crawler machine the working time that each crawler required by task is wanted is completed respectively to be added, with
Three crawler tasks are completed to the net cycle time of the crawler machine, such as crawler machine, complete three crawler tasks
Working time is respectively 70S, 98,82S, then the net cycle time of the crawler machine is 250S.
Step S130, according to the net cycle time of each crawler machine, to obtain climbing in the web crawlers clusters of machines
The workload of worm machine whether Jun Heng judging result.
Wherein, the net cycle time of crawler machine the long, and the workload that represent the crawler machine is bigger.According to each
The net cycle time of crawler machine can be derived that the workload situation of the crawler machine in the web crawlers clusters of machines.According to
The workload situation of each crawler machine can determine whether the workload of the crawler machine in the web crawlers clusters of machines is equal
Weighing apparatus has represent crawler machine and has been in idle state for a long time if unbalanced, i.e. the calling of crawler machine is unreasonable, reduces
The working efficiency of web crawlers clusters of machines.User can according to the judgment result debug network crawler system, with
Reach and the performance of the network crawler system is sufficiently used, improves crawler efficiency.
Referring to FIG. 2, Fig. 2 is the test method according to the network crawler system of Fig. 1 of one exemplary embodiment of the disclosure
In step S130 flow diagram, in the embodiment shown in Figure 2, when the total working according to each crawler machine
Between, with the workload that obtains the crawler machine in the web crawlers clusters of machines, whether Jun Heng judging result includes:
The net cycle time of each crawler machine is ranked up, to obtain by step S210 according to sequence from small to large
Working time sequence;
Step S220 is based on obtained working time sequence, by the last one total working in the working time sequence
Time subtracts first net cycle time, to obtain time difference;
Step S230, by the time difference divided by first net cycle time in the time series, to obtain the network
The balanced rate of the workload of crawler machine in crawler clusters of machines;
Step S240, be based on the equilibrium rate, judge the crawler machine in the web crawlers clusters of machines workload whether
It is balanced.
In the present example embodiment, the net cycle time of each crawler machine is arranged according to sequence from small to large
It include 4 crawler machines in sequence, such as web crawlers clusters of machines, the corresponding net cycle time of 4 crawler machines is
The net cycle time of each crawler machine is ranked up according to sequence from small to large, obtains by 125S, 113S, 98S and 136S
Working time sequence be (98,113,125,136).Based on obtained working time sequence, the working time will be arranged in
The working time of last in sequence subtract be arranged in the primary working time in the working time sequence will the work
Maximum value in time series subtracts minimum value, to obtain time difference.Such as working time sequence be (98,113,125,
136), then the time difference of the working time sequence is 136-98=38.
Obtained time difference will be calculated divided by being arranged in primary net cycle time in the working time sequence, with
It obtains the time difference and accounts for the ratio for being arranged in primary net cycle time in the working time sequence, which is the net
The balanced rate of the workload of crawler machine in network crawler clusters of machines.Such as 4 crawler machines of web crawlers clusters of machines
Working time sequence be (98,113,125,136), the time difference of the working time sequence is 136-98=38, when by this
Between difference divided by primary net cycle time in the working time sequence is arranged in, obtain climbing for the web crawlers clusters of machines
The balanced rate of the workload of worm machine is 38/98 ≈ 38.78%.
According to the equilibrium rate, it can intuitively show that the longest crawler machine of net cycle time is more shortest than net cycle time
Relationship between the workload of the shortest crawler machine of the workload and net cycle time that crawler machine is had more.Balanced rate is got over
Greatly, then the workload that represent the longest crawler machine of net cycle time is compared to the shortest crawler machine of net cycle time
The workload of the more crawler machines i.e. in the web crawlers clusters of machines of workload is unbalanced, conversely, balanced rate is smaller, then generation
Table the workload of the longest crawler machine of net cycle time be compared to the workload of the shortest crawler machine of net cycle time
Smaller, i.e., the workload of the crawler machine in the web crawlers clusters of machines is balanced.
In one exemplary embodiment, described to be based on the equilibrium rate, judge the crawler machine in the web crawlers clusters of machines
Whether equilibrium includes: the workload of device
When the equilibrium rate is less than or equal to predetermined threshold, the work of the crawler machine in the web crawlers clusters of machines is determined
Work amount is balanced;
When the equilibrium rate is greater than predetermined threshold, the workload of the crawler machine in the web crawlers clusters of machines is determined not
It is balanced.
Wherein, predetermined threshold is to be pre-configured with, which can be 10%, 20% or 25% etc., this example
Particular determination is not done to this.In one example, obtaining the predetermined threshold can be acquired by user equipment, for example, mobile phone or
Computer etc., the user equipment to user show it is specific obtain interface, triggered by user obtain specific function on interface into
Row obtains, such as user clicks " predetermined threshold input " button obtained on interface, obtains on interface and input frame, Yong Hutong occurs
It crosses input equipment such as keyboard or touching display screen and inputs predetermined threshold in input frame.
The embodiment of the present disclosure additionally provides a kind of test device of network crawler system.Refering to what is shown in Fig. 4, the exemplary web
The test device of network crawler system may include task acquisition module 410, time recording module 420 and judgment module 430.Its
In:
Task acquisition module 410 is configured as: when receiving test request signal, by obtaining in system task database
Crawler task, and the crawler task is sent to crawler task sorter;
Time recording module 420 is configured as: when the crawler task sorter is to web crawlers clusters of machines distributed tasks
When, obtain the net cycle time of each crawler machine in web crawlers clusters of machines;
Judgment module 430 is configured as: according to the net cycle time of each crawler machine, to obtain the web crawlers machine
The workload of crawler machine in cluster whether Jun Heng judging result.
In an illustrative embodiments, the judgment module 430 further includes sequencing unit 431, the first computing unit
432, the second computing unit 433 and judging unit 434, in which:
Sequencing unit 431 is used to for the net cycle time of each crawler machine being ranked up according to sequence from small to large,
To obtain time series;
First computing unit 432 is used to be based on obtained working time sequence, will be last in the working time sequence
One net cycle time subtracts first net cycle time, to obtain time difference;
Second computing unit 433 is used for the time difference divided by first net cycle time in the time series, with
Obtain the balanced rate of the workload of the crawler machine in the web crawlers clusters of machines;
Judging unit 434 is used to be based on the equilibrium rate, judges the work of the crawler machine in the web crawlers clusters of machines
Whether amount is balanced.
The detail of each module is in corresponding network crawler system in the test device of above-mentioned network crawler system
Test method in be described in detail, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description
Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more
Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould
The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want
These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize
Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/
Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is executed according to disclosure embodiment
Method.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
According to an exemplary embodiment, which can be implemented as a kind of electronic equipment, which includes storage
Device and processor, computer program is stored in the memory, and the computer program makes when being executed by the processor
It obtains the processor and executes any of each method embodiment as described above, alternatively, the computer program is described
The function that processor makes the electronic equipment realize that component units/module of each embodiment of device as described above is realized when executing
Energy.
Processor described in above embodiment can refer to single processing unit, such as central processing unit CPU, can also
Be include multiple dispersions processing unit distributed processor system.
Memory described in above embodiment may include one or more memories, can be and calculates equipment
Internal storage, such as transient state or non-transient various memories, are also possible to be connected to calculating equipment by memory interface
External memory.
The electronic equipment 500 of this embodiment according to the present invention is described referring to Fig. 5.The electronics that Fig. 5 is shown
Equipment 500 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, electronic equipment 500 is showed in the form of universal computing device.The component of electronic equipment 500 can wrap
It includes but is not limited to: at least one above-mentioned processing unit 510, at least one above-mentioned storage unit 520, the different system components of connection
The bus 530 of (including storage unit 520 and processing unit 510).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 510
Row, so that various according to the present invention described in the execution of the processing unit 510 above-mentioned " illustrative methods " part of this specification
The step of illustrative embodiments.For example, the processing unit 510 can execute step S110 as shown in fig. 1, work as reception
When to test request signal, by obtaining crawler task in system task database, and the crawler task is sent to crawler task
Sorter;Step S120 obtains web crawlers machine when the crawler task sorter is to web crawlers clusters of machines distributed tasks
The net cycle time of each crawler machine in device cluster;Step S130, according to the net cycle time of each crawler machine, to obtain
To the crawler machine in the web crawlers clusters of machines workload whether Jun Heng judging result.
Storage unit 520 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit
(RAM) 5201 and/or cache memory unit 5202, it can further include read-only memory unit (ROM) 5203.
Storage unit 520 can also include program/utility with one group of (at least one) program module 5205
5204, such program module 5205 includes but is not limited to: operating system, one or more application program, other program moulds
It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 530 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 500 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 500 communicate, and/or with make
Any equipment (such as the router, modulation /demodulation that the electronic equipment 500 can be communicated with one or more of the other calculating equipment
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 550.Also, electronic equipment 500 can be with
By network adapter 560 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.As shown, network adapter 560 is communicated by bus 530 with other modules of electronic equipment 500.
It should be understood that although not shown in the drawings, other hardware and/or software module can not used in conjunction with electronic equipment 500, including but not
Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and
Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment
Method.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with
Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also
In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute
Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this hair
The step of bright various illustrative embodiments.
Refering to what is shown in Fig. 6, describing the program product for realizing the above method of embodiment according to the present invention
600, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device,
Such as it is run on PC.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing can be with
To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or
System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive
List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only
Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory
(CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal,
Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing
Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its
The program of combined use.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have
Line, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional mistake
Formula programming language-such as " C " language or similar programming language.Program code can be calculated fully in user
It executes in equipment, partly execute on a user device, executing, as an independent software package partially in user calculating equipment
Upper part executes on a remote computing or executes in remote computing device or server completely.It is being related to remotely counting
In the situation for calculating equipment, remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention
It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable
Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
His embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or
Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim
It points out.
Claims (10)
1. a kind of test method of network crawler system characterized by comprising
When receiving test request signal, sent by obtaining crawler task in system task database, and by the crawler task
To crawler task sorter;
When the crawler task sorter is to web crawlers clusters of machines distributed tasks, obtain every in web crawlers clusters of machines
The net cycle time of one crawler machine;
According to the net cycle time of each crawler machine, to obtain the workload of the crawler machine in the web crawlers clusters of machines
Whether Jun Heng judging result.
2. the test method of network crawler system according to claim 1, which is characterized in that the acquisition web crawlers machine
The net cycle time of each crawler machine in device cluster includes:
When each crawler machine receives the crawler task distributed by the crawler task sorter, it is complete to record the crawler machine
The working time wanted at the crawler required by task;
When the task distribution in the crawler task sorter finishes and all crawler tasks have been completed, it is based on each crawler
Machine completes the working time of each crawler required by task, and the net cycle time of each crawler machine is calculated.
3. the test method of network crawler system according to claim 2, which is characterized in that described to record the crawler machine
Completing the working time that the crawler required by task is wanted includes:
Start timing when the crawler machine starts and crawls for the first time when the crawler machine receives the crawler task;
When the crawler machine needle terminates timing to the crawler task completion pre-determined number after crawling, to obtain the crawler machine
It completes the working time of the crawler required by task, and the working time is subjected to corresponding storage with crawler machine.
4. the test method of network crawler system according to claim 1, which is characterized in that described according to each crawler machine
The net cycle time of device, with obtain the crawler machine in the web crawlers clusters of machines workload whether Jun Heng judging result
Include:
The net cycle time of each crawler machine is ranked up according to sequence from small to large, to obtain working time sequence;
Based on obtained working time sequence, the last one net cycle time in the working time sequence is subtracted first
Net cycle time, to obtain time difference;
By the time difference divided by first net cycle time in the working time sequence, to obtain the web crawlers machine collection
The balanced rate of the workload of crawler machine in group;
Based on the equilibrium rate, judge whether the workload of the crawler machine in the web crawlers clusters of machines is balanced.
5. the test method of network crawler system according to claim 4, which is characterized in that it is described to be based on the equilibrium rate,
Judge whether equilibrium includes: for the workload of the crawler machine in the web crawlers clusters of machines
When the equilibrium rate is less than or equal to predetermined threshold, the workload of the crawler machine in the web crawlers clusters of machines is determined
It is balanced;
When the equilibrium rate is greater than predetermined threshold, determine that the workload of the crawler machine in the web crawlers clusters of machines is uneven
Weighing apparatus.
6. the test method of network crawler system according to claim 1, which is characterized in that by system task database
Before middle acquisition crawler task, further includes:
Obtain multiple uniform resource locator;
Multiple uniform resource locator is sent to the web crawlers clusters of machines, by climbing in the web crawlers clusters of machines
Worm machine crawls each uniform resource locator, and record crawls result;
When the quantity for crawling result meets predetermined quantity, result is crawled as the task of crawling be stored in all system task number
According in library.
7. a kind of test device of network crawler system characterized by comprising
Task acquisition module is configured to when receiving test request signal, by obtaining crawler task in system task database,
And the crawler task is sent to crawler task sorter;
Time recording module is configured to obtain net when the crawler task sorter is to web crawlers clusters of machines distributed tasks
The net cycle time of each crawler machine in network crawler clusters of machines;
Judgment module is configured to the net cycle time according to each crawler machine, to obtain in the web crawlers clusters of machines
The workload of crawler machine whether Jun Heng judging result.
8. the test device of network crawler system according to claim 7, which is characterized in that the judgment module includes:
Sequencing unit, for the net cycle time of each crawler machine to be ranked up according to sequence from small to large, to obtain
Time series;
First computing unit, it is for being based on obtained working time sequence, the last one in the working time sequence is total
Working time subtracts first net cycle time, to obtain time difference;
Second computing unit, for by the time difference divided by first net cycle time in the time series, to be somebody's turn to do
The balanced rate of the workload of crawler machine in web crawlers clusters of machines;
Judging unit, for be based on the equilibrium rate, judge the crawler machine in the web crawlers clusters of machines workload whether
It is balanced.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
The test method of the network crawler system as described in any one of claim 1-6 is realized when processor executes.
10. a kind of electronic equipment characterized by comprising
Processor;And
Memory is stored thereon with computer program;
Wherein, the processor is configured to realizing via the computer program is executed as any one in claim 1-6
The test method of network crawler system described in.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910444805.8A CN110333980A (en) | 2019-05-24 | 2019-05-24 | The test method and device of network crawler system, storage medium, electronic equipment |
PCT/CN2019/123059 WO2020238131A1 (en) | 2019-05-24 | 2019-12-04 | Web crawler system testing method and apparatus, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910444805.8A CN110333980A (en) | 2019-05-24 | 2019-05-24 | The test method and device of network crawler system, storage medium, electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110333980A true CN110333980A (en) | 2019-10-15 |
Family
ID=68140378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910444805.8A Pending CN110333980A (en) | 2019-05-24 | 2019-05-24 | The test method and device of network crawler system, storage medium, electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110333980A (en) |
WO (1) | WO2020238131A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020238131A1 (en) * | 2019-05-24 | 2020-12-03 | 深圳壹账通智能科技有限公司 | Web crawler system testing method and apparatus, storage medium, and electronic device |
CN115328812A (en) * | 2022-10-11 | 2022-11-11 | 深圳华锐分布式技术股份有限公司 | UI (user interface) testing method, device, equipment and medium based on web crawler |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040225644A1 (en) * | 2003-05-09 | 2004-11-11 | International Business Machines Corporation | Method and apparatus for search engine World Wide Web crawling |
CN106202108A (en) * | 2015-05-06 | 2016-12-07 | 阿里巴巴集团控股有限公司 | Web crawlers captures method for allocating tasks and device and data grab method and device |
CN106648445A (en) * | 2015-10-30 | 2017-05-10 | 北京国双科技有限公司 | Data storage method and apparatus used for crawler |
CN107071009A (en) * | 2017-03-28 | 2017-08-18 | 江苏飞搏软件股份有限公司 | A kind of distributed big data crawler system of load balancing |
CN107203623A (en) * | 2017-05-26 | 2017-09-26 | 山东省科学院情报研究所 | The load balancing adjusting method of network crawler system |
CN107562541A (en) * | 2017-09-05 | 2018-01-09 | 广东科杰通信息科技有限公司 | A kind of distributed reptile method of load balancing, crawler system |
CN108205541A (en) * | 2016-12-16 | 2018-06-26 | 北大方正集团有限公司 | The dispatching method and device of distributed network reptile task |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9253154B2 (en) * | 2008-08-12 | 2016-02-02 | Mcafee, Inc. | Configuration management for a capture/registration system |
CN110333980A (en) * | 2019-05-24 | 2019-10-15 | 深圳壹账通智能科技有限公司 | The test method and device of network crawler system, storage medium, electronic equipment |
-
2019
- 2019-05-24 CN CN201910444805.8A patent/CN110333980A/en active Pending
- 2019-12-04 WO PCT/CN2019/123059 patent/WO2020238131A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040225644A1 (en) * | 2003-05-09 | 2004-11-11 | International Business Machines Corporation | Method and apparatus for search engine World Wide Web crawling |
CN106202108A (en) * | 2015-05-06 | 2016-12-07 | 阿里巴巴集团控股有限公司 | Web crawlers captures method for allocating tasks and device and data grab method and device |
CN106648445A (en) * | 2015-10-30 | 2017-05-10 | 北京国双科技有限公司 | Data storage method and apparatus used for crawler |
CN108205541A (en) * | 2016-12-16 | 2018-06-26 | 北大方正集团有限公司 | The dispatching method and device of distributed network reptile task |
CN107071009A (en) * | 2017-03-28 | 2017-08-18 | 江苏飞搏软件股份有限公司 | A kind of distributed big data crawler system of load balancing |
CN107203623A (en) * | 2017-05-26 | 2017-09-26 | 山东省科学院情报研究所 | The load balancing adjusting method of network crawler system |
CN107562541A (en) * | 2017-09-05 | 2018-01-09 | 广东科杰通信息科技有限公司 | A kind of distributed reptile method of load balancing, crawler system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020238131A1 (en) * | 2019-05-24 | 2020-12-03 | 深圳壹账通智能科技有限公司 | Web crawler system testing method and apparatus, storage medium, and electronic device |
CN115328812A (en) * | 2022-10-11 | 2022-11-11 | 深圳华锐分布式技术股份有限公司 | UI (user interface) testing method, device, equipment and medium based on web crawler |
Also Published As
Publication number | Publication date |
---|---|
WO2020238131A1 (en) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083455B (en) | Graph calculation processing method, graph calculation processing device, graph calculation processing medium and electronic equipment | |
CN106649084B (en) | The acquisition methods and device of function call information, test equipment | |
CN103294485B (en) | Web service method for packing and system for ABINIT concurrent computational system | |
CN110716783A (en) | Front-end page generation and deployment method and device, storage medium and equipment | |
EP3011442A1 (en) | Method and apparatus for customized software development kit (sdk) generation | |
CN111045653B (en) | System generation method and device, computer readable medium and electronic equipment | |
CN109446038A (en) | The statistical method and terminal device of page access duration | |
CN110007819A (en) | The operation indicating method, apparatus and computer readable storage medium of system | |
CN110471585A (en) | Function of application icon methods of exhibiting, device and computer equipment | |
CN110333980A (en) | The test method and device of network crawler system, storage medium, electronic equipment | |
CN112988185A (en) | Cloud application updating method, device and system, electronic equipment and storage medium | |
CN110781180A (en) | Data screening method and data screening device | |
CN112306471A (en) | Task scheduling method and device | |
CN111427577A (en) | Code processing method and device and server | |
CN111352951A (en) | Data export method, device and system | |
CN107066536A (en) | Comment determines method and device | |
CN116974874A (en) | Database testing method and device, electronic equipment and readable storage medium | |
CN105426183B (en) | A kind of form validation method | |
CN108959294A (en) | A kind of method and apparatus accessing search engine | |
CN116225690A (en) | Memory multidimensional database calculation load balancing method and system based on docker | |
CN112463574A (en) | Software testing method, device, system, equipment and storage medium | |
CN111666201A (en) | Regression testing method, device, medium and electronic equipment | |
CN109376048A (en) | A kind of test method and equipment of touch screen | |
CN112162963A (en) | Data synchronization method and device, computer equipment and storage medium | |
CN107526827A (en) | Method, equipment and computer-readable recording medium for question and answer displaying |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB02 | Change of applicant information |
Address after: 201, room 518000, building A, No. 1, front Bay Road, Qianhai Shenzhen Guangdong Shenzhen Hong Kong cooperation zone (Qianhai business secretary) Applicant after: ONECONNECT FINANCIAL TECHNOLOGY Co.,Ltd. (SHANGHAI) Address before: 518000 Guangdong city of Shenzhen province Qianhai Shenzhen Hong Kong cooperation zone before Bay Road No. 1 building 201 room A Applicant before: ONECONNECT FINANCIAL TECHNOLOGY Co.,Ltd. (SHANGHAI) |
|
CB02 | Change of applicant information | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191015 |
|
WD01 | Invention patent application deemed withdrawn after publication |