CN110442766A - Webpage data acquiring method, device, equipment and storage medium - Google Patents

Webpage data acquiring method, device, equipment and storage medium Download PDF

Info

Publication number
CN110442766A
CN110442766A CN201910627107.1A CN201910627107A CN110442766A CN 110442766 A CN110442766 A CN 110442766A CN 201910627107 A CN201910627107 A CN 201910627107A CN 110442766 A CN110442766 A CN 110442766A
Authority
CN
China
Prior art keywords
task
acquisition
data
node
operation resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910627107.1A
Other languages
Chinese (zh)
Inventor
董晨辉
任延辉
谷广鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN201910627107.1A priority Critical patent/CN110442766A/en
Publication of CN110442766A publication Critical patent/CN110442766A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload

Abstract

The disclosure provides a kind of webpage data acquiring method, device, equipment and storage medium, is related to technical field of data processing.This method acquires the information and preset script template, the corresponding task of wound building data acquisition demand of demand according to data, so that can create corresponding task for different data acquisition demands and corresponding preset script template;And then pass through the operation resource status of the operation resource and multiple acquisition nodes that obtain required by task, the determining acquisition node to match with task is as destination node from multiple acquisition nodes, and by mission dispatching to destination node, so that destination node carries out data acquisition according to task, to which the operation resource of required by task be associated with the operation resource status of acquisition node, it can be the accurate acquisition node of task choosing, improve the operational efficiency that acquisition node executes task, i.e. data acquisition efficiency.

Description

Webpage data acquiring method, device, equipment and storage medium
Technical field
This disclosure relates to technical field of data processing, in particular to a kind of webpage data acquiring method, device, equipment and deposit Storage media.
Background technique
Web crawlers is a kind of according to preset rules, automatically grabs the program or script of web message, passes through net Network crawler technology can purposefully scan for the different data such as picture, database, audio/video multimedia, so that Can be according to set grasping condition, to these information contents, intensively the data with certain structure are found and are obtained again It takes, can be widely applied in the information scratching of major search engine.
The existing website for mass data such as news category, social categories, generallys use distribution and crawls system, distributed Crawler system provides the stronger ability of crawling, and is different from single-point crawler system, and distributed reptile system can pass through task schedule Mode, mission dispatching is realized that data crawling to working node by scheduling node in system in task schedule, thus Data acquisition session is executed by the random selection working node of task schedule node.
But existing distributed reptile system executes data above by the random selection working node of task schedule node The mode of acquisition tasks has that node tasks execution efficiency is low.
Summary of the invention
The purpose of the disclosure is, in view of the deficiency of the prior art, provides a kind of webpage data acquiring method, dress It sets, equipment and storage medium, to solve the problems, such as that node tasks execution efficiency is low in the prior art.
To achieve the above object, the embodiment of the present disclosure the technical solution adopted is as follows:
In a first aspect, the embodiment of the present disclosure provides a kind of webpage data acquiring method, this method comprises: being adopted according to data The information of collection demand and preset script template, the corresponding task of wound building data acquisition demand;Obtain the operation of required by task The operation resource status of resource and multiple acquisition nodes;It is provided according to the operation of the operation resource of required by task and multiple acquisition nodes Source state determines destination node from multiple acquisition nodes;Mission dispatching is given to the destination node, so that destination node root The acquisition of web data is carried out according to task.
Optionally, the above-mentioned information according to data acquisition demand and preset script template, create building data acquisition demand Before corresponding task, comprising: acquire demand according to data, from preset script template library, determine that data acquire demand pair The script template answered is preset script template;Script template library includes: script corresponding at least one data acquisition demand Template, different data acquisition demands correspond to different script templates.
Optionally, the operation resource status of the operation resource of above-mentioned acquisition required by task and multiple acquisition nodes, comprising: root According to the information of task, from preset tasking learning resources bank, the operation resource of required by task, tasking learning resources bank packet are determined Include: at least one task corresponds to the information of required operation resource in class where the task;Institute is obtained from multiple acquisition nodes State operation resource status.
Optionally, the above-mentioned information according to task determines the operation of required by task from preset tasking learning resources bank Before resource, this method further include: each appoint according in the task place class in historical time section, run on multiple acquisition nodes The operation resource status of business and each acquisition node, determines the operation resource of each required by task;According at least one task The information of required operation resource, obtains tasking learning resources bank.
It is optionally, above-mentioned that obtain the operation resource status from multiple acquisition nodes include: according to the preset time cycle Heartbeat request is sent to multiple acquisition nodes;The heartbeat response that each acquisition node is sent is received, heartbeat response includes: each adopts Collect the working condition of node and the operation resource status of each acquisition node.
Optionally, the above method further include: receive destination node in carrying out data acquisition with preset week time The execution state of operation resource status and task transmitted by phase.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task Corresponding data acquisition does not complete, but task has terminated;The above method further include: receive the operation for the task that destination node returns Log, running log include: the operating parameter of task in the process of running.
Optionally, the above method further include: receive the ending message for the task that destination node is sent;Gained is acquired to data To data carry out cleaning and/or in-stockroom operation.
Optionally, above-mentioned operation resource status includes the combination of following one or more parameters: the parameter of physical resource, net Parameter, the parameter of runing time of network situation.
Optionally, above-mentioned data acquisition demand includes: the information of target webpage, and/or, target data type.
Second aspect, the embodiment of the present disclosure additionally provide a kind of collecting webpage data device, the device include creation module, Obtain module, determining module and acquisition module;Creation module, for according to data acquire demand information and preset script Template, the corresponding task of wound building data acquisition demand;Module is obtained, for obtaining the operation resource and multiple acquisitions of required by task The operation resource status of node;Determining module, for being provided according to the operation resource of required by task and the operation of multiple acquisition nodes Source state determines destination node from multiple acquisition nodes;Acquisition module is used for mission dispatching to destination node, so that Destination node carries out the acquisition of web data according to task.
Optionally, above-mentioned creation module is specifically used for acquiring demand according to data, from preset script template library, really Fixed number is preset script template according to the corresponding script of acquisition demand;Script template library includes: at least one data acquisition demand Corresponding script template, different data acquisition demands correspond to different script templates.
Optionally, above-mentioned acquisition module, specifically for the information according to task, from preset tasking learning resources bank, Determine the operation resource of required by task, tasking learning resources bank includes: the operation of at least one required by task in class where task The information of resource;Operation resource status is obtained from multiple acquisition nodes.
Optionally, above-mentioned determining module is specifically used for according in historical time section, running on multiple acquisition nodes for task The operation resource status of each task and each acquisition node in the class of place, determines the operation resource of each required by task;Root According to the information of the operation resource of at least one required by task, tasking learning resources bank is obtained.
Optionally, above-mentioned acquisition module is specifically used for sending heartbeat to multiple acquisition nodes according to the preset time cycle Request;The heartbeat response that each acquisition node is sent is received, heartbeat response includes: the working condition of each acquisition node, and The operation resource status of each acquisition node.
Optionally, above-mentioned apparatus further includes the first receiving module, is carrying out data acquisition for receiving destination node In with transmitted by the preset time cycle operation resource status and task execution state.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task Corresponding data acquisition does not complete, but task has terminated;Above-mentioned first receiving module is also used to receive appointing for destination node return The running log of business, running log include: the operating parameter of task in the process of running.
Optionally, above-mentioned apparatus further includes the second receiving module and operation module;Second receiving module, for receiving The ending message for the task that destination node is sent;The operation module is cleaned for acquiring obtained data to data And/or in-stockroom operation.
Optionally, above-mentioned operation resource status includes the combination of following one or more parameters: the parameter of physical resource, net Parameter, the parameter of runing time of network situation.
Optionally, above-mentioned data acquisition demand includes: the information of target webpage, and/or, target data type.
The third aspect, the embodiment of the present disclosure additionally provide a kind of collecting webpage data equipment, comprising: processor, storage are situated between Matter and bus, storage medium are stored with the executable machine readable instructions of processor, when data acquisition equipment operation, processor By bus communication between storage medium, processor executes machine readable instructions, to execute net described in above-mentioned first aspect The step of page data acquisition method.
Fourth aspect, the embodiment of the present disclosure additionally provide a kind of storage medium, are stored with computer program on storage medium, The step of when computer program is run by processor to execute webpage data acquiring method described in above-mentioned first aspect.
The beneficial effect of the disclosure is:
The embodiment of the present disclosure provide a kind of webpage data acquiring method, device, equipment and storage medium, this method according to The information of data acquisition demand and preset script template, the corresponding task of wound building data acquisition demand, so that for difference Data acquisition demand and corresponding preset script template, corresponding task can be created;And then pass through acquisition task institute The operation resource status of the operation resource and multiple acquisition nodes that need, determination is adopted with what task matched from multiple acquisition nodes Collect node as destination node, and by mission dispatching to destination node, so that destination node carries out web data according to task Acquisition can be selected for task so that the operation resource of required by task be associated with the operation resource status of acquisition node Accurate acquisition node is selected, the operational efficiency that acquisition node executes task, i.e. data acquisition efficiency are improved.
Detailed description of the invention
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present disclosure Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the disclosure, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of block diagram for collecting webpage data system that the embodiment of the present disclosure provides;
Fig. 2 is a kind of flow diagram for webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 3 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 4 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 5 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 6 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 7 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 8 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 9 is a kind of structural schematic diagram for collecting webpage data device that the embodiment of the present disclosure provides;
Figure 10 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides;
Figure 11 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides;
Figure 12 is a kind of structural schematic diagram for collecting webpage data equipment that the embodiment of the present disclosure provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present disclosure clearer, below in conjunction with the embodiment of the present disclosure In attached drawing, the technical solution in the embodiment of the present disclosure is clearly and completely described, it is clear that described embodiment is Disclosure a part of the embodiment, instead of all the embodiments.The disclosure being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiment of the disclosure provided in the accompanying drawings is not intended to limit below claimed The scope of the present disclosure, but be merely representative of the selected embodiment of the disclosure.Based on the embodiment in the disclosure, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model of disclosure protection It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.In the embodiment of the present disclosure, " multiple " refer to two or more."and/or" describes the incidence relation of affiliated partner, indicates may exist three kinds of relationships, For example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Character "/" is general Indicate that forward-backward correlation object is a kind of relationship of "or".
Fig. 1 is a kind of block diagram for collecting webpage data system that the embodiment of the present disclosure provides.As shown in Figure 1, the data are adopted Collecting system may include scheduling node 110, network 120, acquisition node 130 and database 140, scheduling node 110 and/or acquisition It may include the processor of executing instruction operations in node 130.Scheduling node 110, i.e. task schedule node can be service Device, or other equipment with dispatch deal function, acquisition node 130 are the working node in data collection system, That is the working node of operation data acquisition tasks is also referred to as task acquisition node.Acquisition node 130 can be server, Or its equipment with data acquisition function.
In some embodiments, scheduling node 110 and/or acquisition node 130 can be individual node, be also possible to node Group is also known as node cluster.Server group can be centralized cluster, be also possible to distributed type assemblies.For example, scheduling node 110 is Scheduling node in distributed type assemblies, acquisition node 130 are the acquisition node in distributed type assemblies.
In some embodiments, scheduling node 110 and/or acquisition node 130 may include processor.In some embodiments In, processor may include one or more processing core (for example, single core processor (S) or multi-core processor (S)).Only as act Example, processor may include central processing unit (Central Processing Unit, CPU), specific integrated circuit (Application Specific Integrated Circuit, ASIC), dedicated instruction set processor (Application Specific Instruction-set Processor, ASIP), graphics processing unit (Graphics Processing Unit, GPU), physical processing unit (Physics Processing Unit, PPU), digital signal processor (Digital Signal Processor, DSP), field programmable gate array (Field Programmable Gate Array, FPGA), can Programmed logic device (Programmable Logic Device, PLD), controller, micro controller unit, simplified vocubulary meter Calculation machine (Reduced Instruction Set Computing, RISC) or microprocessor etc., or any combination thereof.
Network 120 can be used for the exchange of information and/or data.In some embodiments, one in data collection system A or multiple components (for example, scheduling node 110 and acquisition node 130) can send information and/or data to other assemblies.In In some embodiments, network 120 can be any kind of wired or wireless network or their combination.Only conduct Example, network 120 may include cable network, wireless network, fiber optic network, telecommunications network, Intranet, internet, office Domain net (Local Area Network, LAN), wide area network (Wide Area Network, WAN), WLAN It is (Wireless Local Area Networks, WLAN), Metropolitan Area Network (MAN) (Metropolitan Area Network, MAN), wide Domain net (Wide Area Network, WAN), public telephone switching network (Public Switched Telephone Network, PSTN), blueteeth network, ZigBee-network or near-field communication (Near Field Communication, NFC) network etc. or its Any combination.In some embodiments, network 120 may include one or more network access points.Wherein, network 120 can be with Including wired or wireless network access point, such as base station and/or network switching node, scheduling node 110 and/or acquisition node Being connected to network 120 by the access point between 130 can swap and/or information.
In some embodiments, database 140 may be coupled to network 120 with one or more in data collection system A component (for example, scheduling node 110 and/or acquisition node 130 etc.) communication.One or more components in data collection system The data or instruction being stored in database 140 can be accessed via network 120.In some embodiments, database 140 can be with The one or more components being directly connected in data collection system, alternatively, database 140 be also possible to scheduling node 110 and/ Or a part of acquisition node 130.
Fig. 2 is a kind of flow diagram for webpage data acquiring method that the embodiment of the present disclosure provides.The execution of this method Main body can be the scheduling node 110 in above-mentioned Fig. 1 in data collection system, as shown in Fig. 2, this method comprises:
S101, the information that demand is acquired according to data and preset script template, wound building data acquisition demand are corresponding Task.
Wherein, the information of data acquisition demand may include: the information of target webpage, and/or, the letter such as target data type Breath, target webpage refer to the webpage for carrying out data acquisition, which can be the webpages such as social category, news category, Target data type can be the data types such as picture, database, audio, video multimedia, according to actual applicable cases, mesh Mark webpage also may include other classification webpages, and target data type also may include other data types, and the disclosure is herein simultaneously The information of data acquisition demand is not defined.
The data, which acquire demand, can acquire demand for the data of user's input, alternatively, the data obtained from other equipment Acquisition demand, alternatively, the data acquisition demand etc. obtained by document analysis.In the case where getting data acquisition demand, Demand can be acquired according to the data, the corresponding script template of data acquisition demand first be determined, then by executing S101 creation Data acquire the corresponding task of demand, wherein each data acquisition demand corresponds to different script templates, passes through preset script Template can to create task template corresponding to data acquisition demand, improve the reusability and creation data of script template Acquisition demand corresponds to the efficiency of task.Optionally, preset script template can be crawler script template, then, it is based on the foot The corresponding task of data acquisition demand that this template is created can be crawler task.
The operation resource status of S102, the operation resource for obtaining required by task and multiple acquisition nodes.
Wherein, operation resource can be the resource for running the required by task, may include that physical resource, Internet resources are timely Between at least one resource such as resource, the disclosure is defined not to this.Acquisition node refers to operation data acquisition tasks Node can be the acquisition node 130 in above-mentioned Fig. 1 data collection system.After task determines, the operation of the required by task Resource namely determination, and by obtain required by task operation resource and multiple acquisition nodes operation resource status, to for Task choosing runs the acquisition node that resource matches with it, to improve the data acquisition efficiency of multiple acquisition nodes.
S103, according to the operation resource of required by task and the operation resource status of multiple acquisition nodes, saved from multiple acquisitions Destination node is determined in point.
Optionally, above-mentioned operation resource status may include the combination of following one or more parameters: the ginseng of physical resource Number, the parameter of Network status, the parameter of runing time.Wherein, the parameter of physical resource may include CPU, memory and network interface card etc. Parameter;The parameter of Network status may include the parameters such as accessed network type, rate and handling capacity;The parameter of runing time It may include the parameters such as duration of the initial time of operation, the end time of operation and operation, according to actual applicable cases, It also may include other parameters, operation resource status is not defined the disclosure herein.
Wherein, when determining destination node from multiple acquisition nodes, it can be determined from multiple acquisition nodes and meet task At least one acquisition node of operation resource needed for operation, and destination node is determined from least one acquisition node, so that The destination node of the execution task can be selected in multiple acquisition nodes for each task.Optionally, from least one acquisition It is random at least one acquisition node of the operation resource needed for meeting task run when determining destination node in node Selection target node, can also be according to preset destination node selection mode, the operation resource needed for task run and acquisition When the operation resource of node meets preset condition, then the acquisition node is determined as destination node.For example, fortune needed for task M1 Row resource is P, and the operation resource of acquisition node is Q, the operation resource Q of operation resource P and acquisition node needed for task M1 When meeting P*110%≤Q≤P*150%, then the acquisition node is determined as to the destination node of the task.It should be noted that The disclosure is not defined the method for determination of destination node herein, according to actual applicable cases, can voluntarily be set It is fixed.
It is alternatively possible to by preset scheduling engine according to the operation resource of required by task and the fortune of multiple acquisition nodes Row resource status determines destination node from multiple acquisition nodes.
S104, by mission dispatching to destination node so that destination node carries out the acquisition of web data according to task.
It, can be by the mission dispatching to destination node, target after determining destination node by the above-mentioned S103 of execution for task Node then carries out data collection task according to issuing for task.
Optionally, when which is crawler task, i.e. data acquisition session, by the crawler mission dispatching to destination node, Destination node can then carry out the acquisition of related web page data according to the crawler task in each webpage.
In conclusion the disclosure provide webpage data acquiring method, this method according to data acquire demand information with And preset script template, the corresponding task of wound building data acquisition demand, so that for different data acquisition demand and right The preset script template answered, can create corresponding task;And then it by the operation resource for obtaining required by task and multiple adopts The operation resource status for collecting node, from multiple acquisition nodes the determining acquisition node to match with task as destination node, And by mission dispatching to destination node, so that destination node carries out data acquisition according to task, thus by the fortune of required by task Row resource associates with the operation resource status of acquisition node, realizes the intelligence tune to acquisition node based on data acquisition demand Degree can be the accurate acquisition node of task choosing, when so that being applied to the acquisition of mass network data, can be improved and appoint The data acquisition efficiency of business realizes the efficient utilization of acquisition node resource.
Fig. 3 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed Shown in 3, in the above-mentioned information according to data acquisition demand and preset script template, building data acquisition demand corresponding is created Before business, comprising:
S201, demand is acquired according to data, from preset script template library, determines the corresponding script of data acquisition demand Template is preset script template.
Script template library includes: script template corresponding at least one data acquisition demand, and different data, which acquire, to be needed Seek corresponding different script template.
Wherein, preset script template library may include at least one script template, and every kind of script template can correspond to one kind Data acquire demand, so as to acquire demand according to different data, can choose data in preset script template library Script template corresponding to acquisition demand avoids and acquires demand for different data, needs to rewrite different data The case where acquiring script can be improved wound building data acquisition demand and correspond to the efficiency of task, and then improve data acquisition node Collecting efficiency.Optionally, every kind of preset script template can be adopted for user according to every kind of data in preset script template library Collection demand programs to obtain, and is uploaded to the template of data collection system.
In addition, acquiring demand according to different data, configuration selects corresponding script mould in preset script template library The reuse and unified management of script template may be implemented in plate, can be further improved the efficiency of creation task, and then improve data The working efficiency of acquisition node, the collecting webpage data script style for avoiding different user from being write is different, cannot achieve to net The reuse and management of page data acquisition script.
Optionally, the information of demand is acquired according to data and preset script template, wound building data acquisition demand correspond to Task after, can be using the opening and closing of control task manually or by the way of timing, the disclosure do not open task The mode for opening and closing is defined, and according to actual application scenarios, can take corresponding mode.Wherein, after task unlatching, It can operation resource and multiple acquisition nodes according to above-mentioned S103 step, by preset scheduling engine according to required by task Operation resource status, destination node is determined from multiple acquisition nodes, particular content can refer to the contents of the section, and the disclosure exists This is just repeated no more.
Fig. 4 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed Shown in 4, the operation resource of above-mentioned acquisition required by task and the operation resource status of multiple acquisition nodes, comprising:
S301, the information according to task determine the operation resource of required by task from preset tasking learning resources bank.
Tasking learning resources bank includes: that at least one task corresponds to the information of required operation resource in class where task.
Wherein, preset tasking learning resources bank may include the operation resource of at least one required by task in every generic task Information therefore can be by preset tasking learning resources bank for acquiring demand being created of the task according to data Operation resource needed for determining its operation.It is alternatively possible to be carried out more using the preset update cycle to tasking learning resources bank Newly, so that tasking learning resources bank may include the information of operation resource needed for multiple tasks, and then according to multiple tasks Information can determine the operation resource of required by task from preset tasking learning resources bank.
For example, when operation resource includes CPU, memory and network interface card physical resource, Network status resource, runing time resource, If tasking learning resources bank includes: that physical resource needed for operation task M1 is A1, Network status resource is B1, runing time money Source is C1;Physical resource needed for operation task M2 is A2, and Network status resource is B2, and runing time resource is C2, if then root Information and preset script template according to data acquisition demand, when being created for task is M2, at this time according to the information of task, Physical resource needed for task M2 can be learnt to operation to creep in preset tasking learning resources bank as A2, Network status provides Source is B2, and runing time resource is C2.
S302, operation resource status is obtained from multiple acquisition nodes.
According to the real work situation of acquisition node, each acquisition node may in operation resource status in different time periods Difference allows to obtain it is alternatively possible to obtain operation resource status from multiple acquisition nodes according to the preset time cycle To the newest operation resource status of multiple acquisition nodes, wherein the predetermined period can be 1 minute, 5 minutes etc., and the disclosure is simultaneously It is defined not to this.
Optionally, the above-mentioned information according to task determines the operation of required by task from preset tasking learning resources bank Before resource, this method further include: each appoint according in the task place class in historical time section, run on multiple acquisition nodes The operation resource status of business and each acquisition node, determines the operation resource of each required by task;According at least one task The information of required operation resource, obtains tasking learning resources bank.
Wherein, by the way that in historical time section, running on multiple acquisition nodes for task and operation resource are compared, It can determine the operation resource of each required by task;According to the information of the operation resource of each required by task, task can be obtained Learning object repository, it can carried out accordingly more in information of the task learning object repository to the operation resource of each required by task Newly.Optionally, S103 step is being executed, according to the operation resource status of the operation resource of required by task and multiple acquisition nodes, When determining destination node from multiple acquisition nodes, the operation money for most meeting required by task can be selected from multiple acquisition nodes The acquisition node in source is as destination node, to improve the data acquisition efficiency of task, realizes the efficient benefit of acquisition node resource With.
For example it is assumed that the initial physical resource of acquisition node 1 and 2 is identical, operation has task M1 and M2 on acquisition node 1, Operation has task M1, M2 and M3 on acquisition node 2, at this point, by the task and acquisition node 2 on operation acquisition node 1 Task, and the physical resources such as CPU, memory, network interface card that the M3 that goes out on missions is used can be estimated by contrast.Wherein, it is run in task M3 Period can also continuous monitoring network situation, Internet resources needed for obtaining task M3;When task M3 end of run, obtains and appoint Time resource needed for business M3, required fortune when may thereby determine that using preset script template creation task M3 efficient operation Row resource.
Fig. 5 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed It is above-mentioned to include: from multiple acquisition nodes acquisition operation resource status shown in 5
S401, heartbeat request is sent to multiple acquisition nodes according to the preset time cycle.
It, can be according to default when needing to obtain the operation resource status of each acquisition node for multiple acquisition nodes Time cycle, heartbeat request is sent to multiple acquisition nodes, to request to know the operation resource status of multiple acquisition nodes.Its In, which can be 1 minute, 5 minutes etc., can be to the preset time cycle according to actual application scenarios It is adjusted correspondingly, the disclosure is not defined the occurrence of preset time cycle herein.
S402, the heartbeat response that each acquisition node is sent is received, heartbeat response includes: the work shape of each acquisition node The operation resource status of state and each acquisition node.
Wherein, when each acquisition node receives above-mentioned heartbeat request, then each acquisition node can be by respective node Working condition and operation resource status are sent to task scheduling server, and task scheduling server can then receive each adopt at this time Collect the heartbeat response that node is sent, accordingly, heartbeat response includes the working condition and each acquisition section of each acquisition node The operation resource status of point, wherein the working condition of acquisition node may include normal operating conditions and abort situation.
The webpage data acquiring method, it is available to arrive the newest operation resource status of multiple acquisition nodes, so that in root According to the operation resource of required by task and the operation resource status of multiple acquisition nodes, the newest fortune of multiple acquisition nodes can be based on Row resource status determines destination node, to be the accurate destination node of task choosing.
Fig. 6 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed Shown in 6, the above method may also include that
S501, destination node is received in carrying out data acquisition with operation resource transmitted by the preset time cycle The execution state of state and task.
Wherein, it according to the operation resource status of the operation resource of required by task and multiple acquisition nodes, is saved from multiple acquisitions After determining destination node in point, task scheduling server should also receive destination node in carrying out data acquisition with preset The state of operation resource status and task that time cycle sends, so that server can be with the work of real-time awareness destination node The operating status of state and task.Optionally, operation resource status may include the parameter of the physical resource of destination node, net Parameter, parameter of runing time of network situation etc., the disclosure is defined not to this;The execution state of task may include appointing Business carry out in, the abnormal end of task and task normally complete, may further include other shapes according to real-time applicable cases Condition, the disclosure are defined not to this.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task Corresponding data acquisition does not complete, but task has terminated;The above method further include: receive the operation for the task that destination node returns Log, running log include: the operating parameter of task in the process of running.
Wherein, if task scheduling server receives the task abnormity situation of destination node transmission, task schedule service Device should also receive the running log of the task of destination node return, and user is allowed to know task by the running log of return Operating parameter in the process of running, convenient for user to the task of creeping for unusual condition occur carry out the checking of operating parameter, point Analysis and modification etc. improve maintainable convenient for carrying out the maintenance in later period to task or preset script template.
Fig. 7 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed Shown in 7, the above method further include:
S601, the ending message for receiving the task that destination node is sent.
Wherein, when destination node carries out data acquisition and normal termination according to task, task scheduling server should also be received The ending message for the task that destination node is sent, for identifying the normal termination of task.
S602, cleaning and/or in-stockroom operation are carried out to the obtained data of data acquisition.
And data are carried out according to task for destination node and acquire obtained data, it can be with further progress data cleansing And/or in-stockroom operation, wherein data cleansing refers to the process that data are examined and verified again, it is therefore intended that delete Mistake existing for duplicate message, correction, and data consistency is provided, data loading refers to acquiring data into obtained data It loads in preset database, certainly, the disclosure does not limit the type of the database of the mode of data cleansing and storage It is fixed, it can be selected accordingly according to different applicable cases.
Fig. 8 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed Shown in 8, this method comprises:
S801, demand is acquired according to data, from preset script template library, determines the corresponding script of data acquisition demand Template is preset script template.
S802, the information that demand is acquired according to data and preset script template, wound building data acquisition demand are corresponding Task.
The operation resource status of S803, the operation resource for obtaining required by task and multiple acquisition nodes.
S804, according to the operation resource of required by task and the operation resource status of multiple acquisition nodes, saved from multiple acquisitions Destination node is determined in point.
S805, by mission dispatching to destination node so that destination node carries out the acquisition of web data according to task.
S806, destination node is received in carrying out data acquisition with operation resource transmitted by the preset time cycle The execution state of state and task.
Whether S807, the execution state for judging task include abortive information.
S808, receive destination node return task running log, running log include: task in the process of running Operating parameter.
S809, the ending message for receiving the task that destination node is sent acquire obtained data to data and clean And/or in-stockroom operation.
It should be noted that if the execution state of task includes abortive information, then S808 is executed, if not including, Then execute S809.In addition, it should be noted that, the basic principle of the present embodiment and technical effect and the aforementioned corresponding side of generation Method embodiment is identical, specifically refers to the corresponding contents in embodiment of the method, and the disclosure just repeats no more herein.
Fig. 9 is a kind of structural schematic diagram for collecting webpage data device that the embodiment of the present disclosure provides.The device is substantially former Reason and the technical effect generated are identical as aforementioned corresponding embodiment of the method, to briefly describe, do not refer to part in the present embodiment, It can refer to the corresponding contents in embodiment of the method.As shown in figure 9, the device includes creation module 210, obtains module 220, determines Module 230 and acquisition module 240;Creation module 210, for according to data acquire demand information and preset script mould Plate, the corresponding task of wound building data acquisition demand;Module 220 is obtained, for obtaining the operation resource of required by task and multiple adopting Collect the operation resource status of node;Determining module 230, for according to the operation resource of required by task and the fortune of multiple acquisition nodes Row resource status determines destination node from multiple acquisition nodes;Acquisition module 240, for by mission dispatching to destination node, So that destination node carries out the acquisition of web data according to task.
Optionally, the data acquisition demand in above-mentioned apparatus may include: the information of target webpage, and/or, target data Type.Optionally, the operation resource status in above-mentioned apparatus may include the combination of following one or more parameters: physical resource Parameter, the parameter of Network status, the parameter of runing time.
Optionally, above-mentioned creation module 210 is specifically used for acquiring demand according to data, from preset script template library, Determine that the corresponding script template of data acquisition demand is preset script template;Script template library includes: that at least one data are adopted Script template corresponding to collection demand, different data acquisition demands correspond to different script templates.
Optionally, above-mentioned acquisition module 220, specifically for the information according to task, from preset tasking learning resources bank In, determine the operation resource of required by task, tasking learning resources bank includes: needed at least one task is corresponding in class where task Operation resource information;Operation resource status is obtained from multiple acquisition nodes.
Optionally, above-mentioned determining module 220 is specifically used for running on multiple acquisition nodes according in historical time section The operation resource status of each task and each acquisition node in class where task determines the operation money of each required by task Source;According to the information of the operation resource of at least one required by task, tasking learning resources bank is obtained.
Optionally, above-mentioned acquisition module 220 is specifically used for sending the heart to multiple acquisition nodes according to the preset time cycle Jump request;The heartbeat response that each acquisition node is sent is received, heartbeat response includes: the working condition of each acquisition node, with And the operation resource status of each acquisition node.
Figure 10 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides.Optionally, such as Shown in Figure 10, above-mentioned apparatus further includes the first receiving module 250, for receive destination node carry out data acquisition in The execution state of operation resource status and task transmitted by the preset time cycle.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task Corresponding data acquisition does not complete, but task has terminated;Above-mentioned first receiving module 250 is also used to receive destination node return Task running log, running log includes: the operating parameter of task in the process of running.
Figure 11 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides.Optionally, such as Shown in Figure 11, above-mentioned apparatus further includes the second receiving module 260 and operation module 270, and the second receiving module 260 is for receiving mesh Mark the ending message for the task that node is sent;Operation module 270, for data acquire obtained data carry out cleaning and/ Or in-stockroom operation.
The method that above-mentioned apparatus is used to execute previous embodiment offer, it is similar that the realization principle and technical effect are similar, herein not It repeats again.
The above module can be arranged to implement one or more integrated circuits of above method, such as: one Or multiple specific integrated circuits (Application Specific Integrated Circuit, abbreviation ASIC), or, one Or multi-microprocessor (Digital Singnal Processor, abbreviation DSP), or, one or more field programmable gate Array (Field Programmable Gate Array, abbreviation FPGA) etc..For another example, when some above module passes through processing elements When the form of part scheduler program code is realized, which can be general processor, such as central processing unit (Central Processing Unit, abbreviation CPU) or it is other can be with the processor of caller code.For another example, these modules can integrate Together, it is realized in the form of system on chip (system-on-a-chip, abbreviation SOC).
Figure 12 is a kind of structural schematic diagram for collecting webpage data equipment that the embodiment of the present disclosure provides.As shown in figure 12, The data acquisition equipment may include processor 310, storage medium 320 and bus 330, and storage medium 320 is stored with processor 310 executable machine readable instructions, when data acquisition equipment operation, by total between processor 310 and storage medium 320 Line 330 communicates, and processor 310 executes machine readable instructions, and above method embodiment, realization principle and technology effect may be implemented Seemingly, details are not described herein for fruit.
Data acquisition equipment shown in the Figure 12 can be the scheduling section in above-mentioned collecting webpage data system shown in FIG. 1 Point can be server or other computer equipments with dispatch deal function.
Optionally, the disclosure also provides a storage medium, is stored thereon with computer program, and the computer program is processed When device reads and runs, above method embodiment may be implemented.
In several embodiments provided by the disclosure, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the disclosure can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this public affairs Open the part steps of each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviation: ROM), random access memory (English: Random Access Memory, letter Claim: RAM), the various media that can store program code such as magnetic or disk.
It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting Standby intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.The foregoing is merely preferred embodiment of the present disclosure, it is not limited to this Open, for those skilled in the art, the disclosure can have various modifications and variations.All spirit and original in the disclosure Within then, any modification, equivalent replacement, improvement and so on be should be included within the protection scope of the disclosure.

Claims (10)

1. a kind of webpage data acquiring method, which is characterized in that the described method includes:
According to the information of data acquisition demand and preset script template, the corresponding task of the data acquisition demand is created;
Obtain the operation resource of the required by task and the operation resource status of multiple acquisition nodes;
According to the operation resource status of the operation resource of the required by task and multiple acquisition nodes, from multiple acquisitions Destination node is determined in node;
The mission dispatching is given to the destination node, so that the destination node carries out web data according to the task Acquisition.
2. the method according to claim 1, wherein described acquire the information of demand and preset according to data Script template, before creating the corresponding task of data acquisition demand, comprising:
Demand is acquired according to the data, from preset script template library, determines the corresponding script of the data acquisition demand Template is the preset script template;The script template library includes: script corresponding at least one data acquisition demand Template, different data acquisition demands correspond to different script templates.
3. the method according to claim 1, wherein the operation resource for obtaining the required by task and multiple The operation resource status of acquisition node, comprising:
According to the information of the task, from preset tasking learning resources bank, the operation resource of the required by task, institute are determined State the information that tasking learning resources bank includes: the operation resource of at least one required by task in class where the task;
The operation resource status is obtained from multiple acquisition nodes.
4. according to the method described in claim 3, it is characterized in that, described obtain the operation money from multiple acquisition nodes Source state, comprising:
Heartbeat request is sent to multiple acquisition nodes according to the preset time cycle;
The heartbeat response that each acquisition node is sent is received, the heartbeat response includes: the work of each acquisition node Make the operation resource status of state and each acquisition node.
5. the method according to claim 1, wherein the method also includes:
The destination node is received in carrying out data acquisition with operation resource status transmitted by the preset time cycle And the execution state of the task.
6. according to the method described in claim 5, it is characterized in that, if the execution state of the task includes: abortive Information, the abortive information is used to indicate the corresponding data acquisition of the task and does not complete, but the task has terminated; The method also includes:
The running log for the task that the destination node returns is received, the running log includes: that the task is being run Operating parameter in the process.
7. any method in -6 according to claim 1, which is characterized in that the method also includes:
Receive the ending message for the task that the destination node is sent;
Obtained data are acquired to the data and carry out cleaning and/or in-stockroom operation.
8. a kind of collecting webpage data device, which is characterized in that described device includes creation module, obtains module, determining module And acquisition module;
The creation module, for according to data acquire demand information and preset script template, create the data and adopt The corresponding task of collection demand;
The acquisition module, for obtaining the operation resource of the required by task and the operation resource status of multiple acquisition nodes;
The determining module, for according to the operation resource of the required by task and the operation resource shape of the multiple acquisition node State determines destination node from the multiple acquisition node;
The acquisition module, for giving the mission dispatching to the destination node, so that the destination node is according to The acquisition of task progress web data.
9. a kind of collecting webpage data equipment characterized by comprising processor, storage medium and bus, the storage medium The executable machine readable instructions of the processor are stored with, when collecting webpage data equipment operation, the processor By bus communication between the storage medium, the processor executes the machine readable instructions, to execute as right is wanted The step of seeking 1 to 7 any described webpage data acquiring method.
10. a kind of storage medium, which is characterized in that be stored with computer program, the computer program on the storage medium The step of webpage data acquiring method as described in claim 1 to 7 is any is executed when being run by processor.
CN201910627107.1A 2019-07-11 2019-07-11 Webpage data acquiring method, device, equipment and storage medium Pending CN110442766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910627107.1A CN110442766A (en) 2019-07-11 2019-07-11 Webpage data acquiring method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910627107.1A CN110442766A (en) 2019-07-11 2019-07-11 Webpage data acquiring method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110442766A true CN110442766A (en) 2019-11-12

Family

ID=68430315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910627107.1A Pending CN110442766A (en) 2019-07-11 2019-07-11 Webpage data acquiring method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110442766A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111147325A (en) * 2019-12-18 2020-05-12 深圳市任子行科技开发有限公司 Distributed network information acquisition system, method and computer readable storage medium
CN111708931A (en) * 2020-06-06 2020-09-25 谢国柱 Big data acquisition method based on mobile internet and artificial intelligence cloud service platform
CN114793194A (en) * 2022-03-09 2022-07-26 中国邮政储蓄银行股份有限公司 Service data processing method and device and computer readable storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292691A1 (en) * 2008-05-21 2009-11-26 Sungkyunkwan University Foundation For Corporate Collaboration System and Method for Building Multi-Concept Network Based on User's Web Usage Data
CN101715004A (en) * 2009-11-12 2010-05-26 中国科学院计算技术研究所 Internet video-oriented distributed acquisition method and system
CN103745322A (en) * 2014-01-22 2014-04-23 云南电力调度控制中心 Province-city secondary system integrated comprehensive monitoring and process management system in power dispatching and implementation method for system
CN104735138A (en) * 2015-03-09 2015-06-24 中国科学院计算技术研究所 Distributed acquisition method and system oriented to user generated content
CN105302785A (en) * 2015-09-24 2016-02-03 金蝶软件(中国)有限公司 Data collection method and system
US20170024472A1 (en) * 2015-07-23 2017-01-26 Green Prestige Pte. Ltd. Information retrieval method utilizing webpage visual and language features and system using thereof
CN106372082A (en) * 2015-07-22 2017-02-01 克拉玛依红有软件有限责任公司 Single-file multi-form data automatic storage method and system
CN106570011A (en) * 2015-10-09 2017-04-19 北京京东尚科信息技术有限公司 Distributed crawler URL seed distribution method, dispatching node, and grabbing node
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN107918674A (en) * 2017-12-12 2018-04-17 携程旅游网络技术(上海)有限公司 Acquisition method and its system, storage medium, the electronic equipment of web data
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN108551404A (en) * 2018-04-20 2018-09-18 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of client-side information analysis
CN108768791A (en) * 2018-07-04 2018-11-06 山东汇贸电子口岸有限公司 A kind of information collection configuration management system and method
CN109523446A (en) * 2018-10-19 2019-03-26 北京北大软件工程股份有限公司 A kind of big data processing analysis system towards price field
CN109740038A (en) * 2019-01-02 2019-05-10 安徽芃睿科技有限公司 Network data distributed parallel computing environment and method
CN109814992A (en) * 2018-12-29 2019-05-28 中国科学院计算技术研究所 Distributed dynamic dispatching method and system for the acquisition of large scale network data

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292691A1 (en) * 2008-05-21 2009-11-26 Sungkyunkwan University Foundation For Corporate Collaboration System and Method for Building Multi-Concept Network Based on User's Web Usage Data
CN101715004A (en) * 2009-11-12 2010-05-26 中国科学院计算技术研究所 Internet video-oriented distributed acquisition method and system
CN103745322A (en) * 2014-01-22 2014-04-23 云南电力调度控制中心 Province-city secondary system integrated comprehensive monitoring and process management system in power dispatching and implementation method for system
CN104735138A (en) * 2015-03-09 2015-06-24 中国科学院计算技术研究所 Distributed acquisition method and system oriented to user generated content
CN106372082A (en) * 2015-07-22 2017-02-01 克拉玛依红有软件有限责任公司 Single-file multi-form data automatic storage method and system
US20170024472A1 (en) * 2015-07-23 2017-01-26 Green Prestige Pte. Ltd. Information retrieval method utilizing webpage visual and language features and system using thereof
CN105302785A (en) * 2015-09-24 2016-02-03 金蝶软件(中国)有限公司 Data collection method and system
CN106570011A (en) * 2015-10-09 2017-04-19 北京京东尚科信息技术有限公司 Distributed crawler URL seed distribution method, dispatching node, and grabbing node
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN107918674A (en) * 2017-12-12 2018-04-17 携程旅游网络技术(上海)有限公司 Acquisition method and its system, storage medium, the electronic equipment of web data
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN108551404A (en) * 2018-04-20 2018-09-18 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of client-side information analysis
CN108768791A (en) * 2018-07-04 2018-11-06 山东汇贸电子口岸有限公司 A kind of information collection configuration management system and method
CN109523446A (en) * 2018-10-19 2019-03-26 北京北大软件工程股份有限公司 A kind of big data processing analysis system towards price field
CN109814992A (en) * 2018-12-29 2019-05-28 中国科学院计算技术研究所 Distributed dynamic dispatching method and system for the acquisition of large scale network data
CN109740038A (en) * 2019-01-02 2019-05-10 安徽芃睿科技有限公司 Network data distributed parallel computing environment and method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111147325A (en) * 2019-12-18 2020-05-12 深圳市任子行科技开发有限公司 Distributed network information acquisition system, method and computer readable storage medium
CN111708931A (en) * 2020-06-06 2020-09-25 谢国柱 Big data acquisition method based on mobile internet and artificial intelligence cloud service platform
CN111708931B (en) * 2020-06-06 2020-12-25 湖南伟业动物营养集团股份有限公司 Big data acquisition method based on mobile internet and artificial intelligence cloud service platform
CN114793194A (en) * 2022-03-09 2022-07-26 中国邮政储蓄银行股份有限公司 Service data processing method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN104067257B (en) Automate event management system, management event method and event management system
Lind Iterative software engineering for multiagent systems: the MASSIVE method
US20200272911A1 (en) A cognitive automation engineering system
CN108353090A (en) Edge intelligence platform and internet of things sensors streaming system
CN110442766A (en) Webpage data acquiring method, device, equipment and storage medium
CN108243012B (en) Charging application processing system, method and device in OCS (online charging System)
Cui et al. Scenario analysis of web service composition based on multi-criteria mathematical goal programming
Marzolla Simulation-based performance modeling of UML software architectures.
Qian et al. A workflow-aided Internet of things paradigm with intelligent edge computing
Leitao et al. A survey on factors that impact industrial agent acceptance
Wagner Tutorial: Information and process modeling for simulation
CN108959488A (en) Safeguard the method and device of Question-Answering Model
EP4024761A1 (en) Communication method and apparatus for multiple management domains
Hopp et al. A diagnostic tree for improving production line performance
Virani et al. Service composition based on multi agent in cloud environment
US11816548B2 (en) Distributed learning using ensemble-based fusion
Nguyen et al. Real-time optimisation for industrial internet of things (IIoT): Overview, challenges and opportunities
Wolf Succeedings of the second international software architecture workshop (isaw-2)
Cicirelli et al. Using time stream Petri nets for workflow modelling analysis and enactment
Tuli et al. Optimizing the Performance of Fog Computing Environments Using AI and Co-Simulation
CN109118151B (en) Work order transaction processing method and work order transaction processing system
Khelifati et al. A multi-agent approach for scheduling jobs and maintenance operations in the flowshop sequencing problem
Maamar et al. How to Make Business Processes" Socialize"?
Garcés et al. Towards an architectural patterns language for systems-of-systems
Tapia et al. Organizations of agents in information fusion environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191112