CN110442766A - Webpage data acquiring method, device, equipment and storage medium - Google Patents
Webpage data acquiring method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110442766A CN110442766A CN201910627107.1A CN201910627107A CN110442766A CN 110442766 A CN110442766 A CN 110442766A CN 201910627107 A CN201910627107 A CN 201910627107A CN 110442766 A CN110442766 A CN 110442766A
- Authority
- CN
- China
- Prior art keywords
- task
- acquisition
- data
- node
- operation resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
Abstract
The disclosure provides a kind of webpage data acquiring method, device, equipment and storage medium, is related to technical field of data processing.This method acquires the information and preset script template, the corresponding task of wound building data acquisition demand of demand according to data, so that can create corresponding task for different data acquisition demands and corresponding preset script template;And then pass through the operation resource status of the operation resource and multiple acquisition nodes that obtain required by task, the determining acquisition node to match with task is as destination node from multiple acquisition nodes, and by mission dispatching to destination node, so that destination node carries out data acquisition according to task, to which the operation resource of required by task be associated with the operation resource status of acquisition node, it can be the accurate acquisition node of task choosing, improve the operational efficiency that acquisition node executes task, i.e. data acquisition efficiency.
Description
Technical field
This disclosure relates to technical field of data processing, in particular to a kind of webpage data acquiring method, device, equipment and deposit
Storage media.
Background technique
Web crawlers is a kind of according to preset rules, automatically grabs the program or script of web message, passes through net
Network crawler technology can purposefully scan for the different data such as picture, database, audio/video multimedia, so that
Can be according to set grasping condition, to these information contents, intensively the data with certain structure are found and are obtained again
It takes, can be widely applied in the information scratching of major search engine.
The existing website for mass data such as news category, social categories, generallys use distribution and crawls system, distributed
Crawler system provides the stronger ability of crawling, and is different from single-point crawler system, and distributed reptile system can pass through task schedule
Mode, mission dispatching is realized that data crawling to working node by scheduling node in system in task schedule, thus
Data acquisition session is executed by the random selection working node of task schedule node.
But existing distributed reptile system executes data above by the random selection working node of task schedule node
The mode of acquisition tasks has that node tasks execution efficiency is low.
Summary of the invention
The purpose of the disclosure is, in view of the deficiency of the prior art, provides a kind of webpage data acquiring method, dress
It sets, equipment and storage medium, to solve the problems, such as that node tasks execution efficiency is low in the prior art.
To achieve the above object, the embodiment of the present disclosure the technical solution adopted is as follows:
In a first aspect, the embodiment of the present disclosure provides a kind of webpage data acquiring method, this method comprises: being adopted according to data
The information of collection demand and preset script template, the corresponding task of wound building data acquisition demand;Obtain the operation of required by task
The operation resource status of resource and multiple acquisition nodes;It is provided according to the operation of the operation resource of required by task and multiple acquisition nodes
Source state determines destination node from multiple acquisition nodes;Mission dispatching is given to the destination node, so that destination node root
The acquisition of web data is carried out according to task.
Optionally, the above-mentioned information according to data acquisition demand and preset script template, create building data acquisition demand
Before corresponding task, comprising: acquire demand according to data, from preset script template library, determine that data acquire demand pair
The script template answered is preset script template;Script template library includes: script corresponding at least one data acquisition demand
Template, different data acquisition demands correspond to different script templates.
Optionally, the operation resource status of the operation resource of above-mentioned acquisition required by task and multiple acquisition nodes, comprising: root
According to the information of task, from preset tasking learning resources bank, the operation resource of required by task, tasking learning resources bank packet are determined
Include: at least one task corresponds to the information of required operation resource in class where the task;Institute is obtained from multiple acquisition nodes
State operation resource status.
Optionally, the above-mentioned information according to task determines the operation of required by task from preset tasking learning resources bank
Before resource, this method further include: each appoint according in the task place class in historical time section, run on multiple acquisition nodes
The operation resource status of business and each acquisition node, determines the operation resource of each required by task;According at least one task
The information of required operation resource, obtains tasking learning resources bank.
It is optionally, above-mentioned that obtain the operation resource status from multiple acquisition nodes include: according to the preset time cycle
Heartbeat request is sent to multiple acquisition nodes;The heartbeat response that each acquisition node is sent is received, heartbeat response includes: each adopts
Collect the working condition of node and the operation resource status of each acquisition node.
Optionally, the above method further include: receive destination node in carrying out data acquisition with preset week time
The execution state of operation resource status and task transmitted by phase.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task
Corresponding data acquisition does not complete, but task has terminated;The above method further include: receive the operation for the task that destination node returns
Log, running log include: the operating parameter of task in the process of running.
Optionally, the above method further include: receive the ending message for the task that destination node is sent;Gained is acquired to data
To data carry out cleaning and/or in-stockroom operation.
Optionally, above-mentioned operation resource status includes the combination of following one or more parameters: the parameter of physical resource, net
Parameter, the parameter of runing time of network situation.
Optionally, above-mentioned data acquisition demand includes: the information of target webpage, and/or, target data type.
Second aspect, the embodiment of the present disclosure additionally provide a kind of collecting webpage data device, the device include creation module,
Obtain module, determining module and acquisition module;Creation module, for according to data acquire demand information and preset script
Template, the corresponding task of wound building data acquisition demand;Module is obtained, for obtaining the operation resource and multiple acquisitions of required by task
The operation resource status of node;Determining module, for being provided according to the operation resource of required by task and the operation of multiple acquisition nodes
Source state determines destination node from multiple acquisition nodes;Acquisition module is used for mission dispatching to destination node, so that
Destination node carries out the acquisition of web data according to task.
Optionally, above-mentioned creation module is specifically used for acquiring demand according to data, from preset script template library, really
Fixed number is preset script template according to the corresponding script of acquisition demand;Script template library includes: at least one data acquisition demand
Corresponding script template, different data acquisition demands correspond to different script templates.
Optionally, above-mentioned acquisition module, specifically for the information according to task, from preset tasking learning resources bank,
Determine the operation resource of required by task, tasking learning resources bank includes: the operation of at least one required by task in class where task
The information of resource;Operation resource status is obtained from multiple acquisition nodes.
Optionally, above-mentioned determining module is specifically used for according in historical time section, running on multiple acquisition nodes for task
The operation resource status of each task and each acquisition node in the class of place, determines the operation resource of each required by task;Root
According to the information of the operation resource of at least one required by task, tasking learning resources bank is obtained.
Optionally, above-mentioned acquisition module is specifically used for sending heartbeat to multiple acquisition nodes according to the preset time cycle
Request;The heartbeat response that each acquisition node is sent is received, heartbeat response includes: the working condition of each acquisition node, and
The operation resource status of each acquisition node.
Optionally, above-mentioned apparatus further includes the first receiving module, is carrying out data acquisition for receiving destination node
In with transmitted by the preset time cycle operation resource status and task execution state.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task
Corresponding data acquisition does not complete, but task has terminated;Above-mentioned first receiving module is also used to receive appointing for destination node return
The running log of business, running log include: the operating parameter of task in the process of running.
Optionally, above-mentioned apparatus further includes the second receiving module and operation module;Second receiving module, for receiving
The ending message for the task that destination node is sent;The operation module is cleaned for acquiring obtained data to data
And/or in-stockroom operation.
Optionally, above-mentioned operation resource status includes the combination of following one or more parameters: the parameter of physical resource, net
Parameter, the parameter of runing time of network situation.
Optionally, above-mentioned data acquisition demand includes: the information of target webpage, and/or, target data type.
The third aspect, the embodiment of the present disclosure additionally provide a kind of collecting webpage data equipment, comprising: processor, storage are situated between
Matter and bus, storage medium are stored with the executable machine readable instructions of processor, when data acquisition equipment operation, processor
By bus communication between storage medium, processor executes machine readable instructions, to execute net described in above-mentioned first aspect
The step of page data acquisition method.
Fourth aspect, the embodiment of the present disclosure additionally provide a kind of storage medium, are stored with computer program on storage medium,
The step of when computer program is run by processor to execute webpage data acquiring method described in above-mentioned first aspect.
The beneficial effect of the disclosure is:
The embodiment of the present disclosure provide a kind of webpage data acquiring method, device, equipment and storage medium, this method according to
The information of data acquisition demand and preset script template, the corresponding task of wound building data acquisition demand, so that for difference
Data acquisition demand and corresponding preset script template, corresponding task can be created;And then pass through acquisition task institute
The operation resource status of the operation resource and multiple acquisition nodes that need, determination is adopted with what task matched from multiple acquisition nodes
Collect node as destination node, and by mission dispatching to destination node, so that destination node carries out web data according to task
Acquisition can be selected for task so that the operation resource of required by task be associated with the operation resource status of acquisition node
Accurate acquisition node is selected, the operational efficiency that acquisition node executes task, i.e. data acquisition efficiency are improved.
Detailed description of the invention
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present disclosure
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the disclosure, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of block diagram for collecting webpage data system that the embodiment of the present disclosure provides;
Fig. 2 is a kind of flow diagram for webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 3 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 4 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 5 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 6 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 7 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 8 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides;
Fig. 9 is a kind of structural schematic diagram for collecting webpage data device that the embodiment of the present disclosure provides;
Figure 10 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides;
Figure 11 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides;
Figure 12 is a kind of structural schematic diagram for collecting webpage data equipment that the embodiment of the present disclosure provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present disclosure clearer, below in conjunction with the embodiment of the present disclosure
In attached drawing, the technical solution in the embodiment of the present disclosure is clearly and completely described, it is clear that described embodiment is
Disclosure a part of the embodiment, instead of all the embodiments.The disclosure being usually described and illustrated herein in the accompanying drawings is implemented
The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiment of the disclosure provided in the accompanying drawings is not intended to limit below claimed
The scope of the present disclosure, but be merely representative of the selected embodiment of the disclosure.Based on the embodiment in the disclosure, this field is common
Technical staff's every other embodiment obtained without creative efforts belongs to the model of disclosure protection
It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.In the embodiment of the present disclosure,
" multiple " refer to two or more."and/or" describes the incidence relation of affiliated partner, indicates may exist three kinds of relationships,
For example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Character "/" is general
Indicate that forward-backward correlation object is a kind of relationship of "or".
Fig. 1 is a kind of block diagram for collecting webpage data system that the embodiment of the present disclosure provides.As shown in Figure 1, the data are adopted
Collecting system may include scheduling node 110, network 120, acquisition node 130 and database 140, scheduling node 110 and/or acquisition
It may include the processor of executing instruction operations in node 130.Scheduling node 110, i.e. task schedule node can be service
Device, or other equipment with dispatch deal function, acquisition node 130 are the working node in data collection system,
That is the working node of operation data acquisition tasks is also referred to as task acquisition node.Acquisition node 130 can be server,
Or its equipment with data acquisition function.
In some embodiments, scheduling node 110 and/or acquisition node 130 can be individual node, be also possible to node
Group is also known as node cluster.Server group can be centralized cluster, be also possible to distributed type assemblies.For example, scheduling node 110 is
Scheduling node in distributed type assemblies, acquisition node 130 are the acquisition node in distributed type assemblies.
In some embodiments, scheduling node 110 and/or acquisition node 130 may include processor.In some embodiments
In, processor may include one or more processing core (for example, single core processor (S) or multi-core processor (S)).Only as act
Example, processor may include central processing unit (Central Processing Unit, CPU), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), dedicated instruction set processor (Application
Specific Instruction-set Processor, ASIP), graphics processing unit (Graphics Processing
Unit, GPU), physical processing unit (Physics Processing Unit, PPU), digital signal processor (Digital
Signal Processor, DSP), field programmable gate array (Field Programmable Gate Array, FPGA), can
Programmed logic device (Programmable Logic Device, PLD), controller, micro controller unit, simplified vocubulary meter
Calculation machine (Reduced Instruction Set Computing, RISC) or microprocessor etc., or any combination thereof.
Network 120 can be used for the exchange of information and/or data.In some embodiments, one in data collection system
A or multiple components (for example, scheduling node 110 and acquisition node 130) can send information and/or data to other assemblies.In
In some embodiments, network 120 can be any kind of wired or wireless network or their combination.Only conduct
Example, network 120 may include cable network, wireless network, fiber optic network, telecommunications network, Intranet, internet, office
Domain net (Local Area Network, LAN), wide area network (Wide Area Network, WAN), WLAN
It is (Wireless Local Area Networks, WLAN), Metropolitan Area Network (MAN) (Metropolitan Area Network, MAN), wide
Domain net (Wide Area Network, WAN), public telephone switching network (Public Switched Telephone Network,
PSTN), blueteeth network, ZigBee-network or near-field communication (Near Field Communication, NFC) network etc. or its
Any combination.In some embodiments, network 120 may include one or more network access points.Wherein, network 120 can be with
Including wired or wireless network access point, such as base station and/or network switching node, scheduling node 110 and/or acquisition node
Being connected to network 120 by the access point between 130 can swap and/or information.
In some embodiments, database 140 may be coupled to network 120 with one or more in data collection system
A component (for example, scheduling node 110 and/or acquisition node 130 etc.) communication.One or more components in data collection system
The data or instruction being stored in database 140 can be accessed via network 120.In some embodiments, database 140 can be with
The one or more components being directly connected in data collection system, alternatively, database 140 be also possible to scheduling node 110 and/
Or a part of acquisition node 130.
Fig. 2 is a kind of flow diagram for webpage data acquiring method that the embodiment of the present disclosure provides.The execution of this method
Main body can be the scheduling node 110 in above-mentioned Fig. 1 in data collection system, as shown in Fig. 2, this method comprises:
S101, the information that demand is acquired according to data and preset script template, wound building data acquisition demand are corresponding
Task.
Wherein, the information of data acquisition demand may include: the information of target webpage, and/or, the letter such as target data type
Breath, target webpage refer to the webpage for carrying out data acquisition, which can be the webpages such as social category, news category,
Target data type can be the data types such as picture, database, audio, video multimedia, according to actual applicable cases, mesh
Mark webpage also may include other classification webpages, and target data type also may include other data types, and the disclosure is herein simultaneously
The information of data acquisition demand is not defined.
The data, which acquire demand, can acquire demand for the data of user's input, alternatively, the data obtained from other equipment
Acquisition demand, alternatively, the data acquisition demand etc. obtained by document analysis.In the case where getting data acquisition demand,
Demand can be acquired according to the data, the corresponding script template of data acquisition demand first be determined, then by executing S101 creation
Data acquire the corresponding task of demand, wherein each data acquisition demand corresponds to different script templates, passes through preset script
Template can to create task template corresponding to data acquisition demand, improve the reusability and creation data of script template
Acquisition demand corresponds to the efficiency of task.Optionally, preset script template can be crawler script template, then, it is based on the foot
The corresponding task of data acquisition demand that this template is created can be crawler task.
The operation resource status of S102, the operation resource for obtaining required by task and multiple acquisition nodes.
Wherein, operation resource can be the resource for running the required by task, may include that physical resource, Internet resources are timely
Between at least one resource such as resource, the disclosure is defined not to this.Acquisition node refers to operation data acquisition tasks
Node can be the acquisition node 130 in above-mentioned Fig. 1 data collection system.After task determines, the operation of the required by task
Resource namely determination, and by obtain required by task operation resource and multiple acquisition nodes operation resource status, to for
Task choosing runs the acquisition node that resource matches with it, to improve the data acquisition efficiency of multiple acquisition nodes.
S103, according to the operation resource of required by task and the operation resource status of multiple acquisition nodes, saved from multiple acquisitions
Destination node is determined in point.
Optionally, above-mentioned operation resource status may include the combination of following one or more parameters: the ginseng of physical resource
Number, the parameter of Network status, the parameter of runing time.Wherein, the parameter of physical resource may include CPU, memory and network interface card etc.
Parameter;The parameter of Network status may include the parameters such as accessed network type, rate and handling capacity;The parameter of runing time
It may include the parameters such as duration of the initial time of operation, the end time of operation and operation, according to actual applicable cases,
It also may include other parameters, operation resource status is not defined the disclosure herein.
Wherein, when determining destination node from multiple acquisition nodes, it can be determined from multiple acquisition nodes and meet task
At least one acquisition node of operation resource needed for operation, and destination node is determined from least one acquisition node, so that
The destination node of the execution task can be selected in multiple acquisition nodes for each task.Optionally, from least one acquisition
It is random at least one acquisition node of the operation resource needed for meeting task run when determining destination node in node
Selection target node, can also be according to preset destination node selection mode, the operation resource needed for task run and acquisition
When the operation resource of node meets preset condition, then the acquisition node is determined as destination node.For example, fortune needed for task M1
Row resource is P, and the operation resource of acquisition node is Q, the operation resource Q of operation resource P and acquisition node needed for task M1
When meeting P*110%≤Q≤P*150%, then the acquisition node is determined as to the destination node of the task.It should be noted that
The disclosure is not defined the method for determination of destination node herein, according to actual applicable cases, can voluntarily be set
It is fixed.
It is alternatively possible to by preset scheduling engine according to the operation resource of required by task and the fortune of multiple acquisition nodes
Row resource status determines destination node from multiple acquisition nodes.
S104, by mission dispatching to destination node so that destination node carries out the acquisition of web data according to task.
It, can be by the mission dispatching to destination node, target after determining destination node by the above-mentioned S103 of execution for task
Node then carries out data collection task according to issuing for task.
Optionally, when which is crawler task, i.e. data acquisition session, by the crawler mission dispatching to destination node,
Destination node can then carry out the acquisition of related web page data according to the crawler task in each webpage.
In conclusion the disclosure provide webpage data acquiring method, this method according to data acquire demand information with
And preset script template, the corresponding task of wound building data acquisition demand, so that for different data acquisition demand and right
The preset script template answered, can create corresponding task;And then it by the operation resource for obtaining required by task and multiple adopts
The operation resource status for collecting node, from multiple acquisition nodes the determining acquisition node to match with task as destination node,
And by mission dispatching to destination node, so that destination node carries out data acquisition according to task, thus by the fortune of required by task
Row resource associates with the operation resource status of acquisition node, realizes the intelligence tune to acquisition node based on data acquisition demand
Degree can be the accurate acquisition node of task choosing, when so that being applied to the acquisition of mass network data, can be improved and appoint
The data acquisition efficiency of business realizes the efficient utilization of acquisition node resource.
Fig. 3 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed
Shown in 3, in the above-mentioned information according to data acquisition demand and preset script template, building data acquisition demand corresponding is created
Before business, comprising:
S201, demand is acquired according to data, from preset script template library, determines the corresponding script of data acquisition demand
Template is preset script template.
Script template library includes: script template corresponding at least one data acquisition demand, and different data, which acquire, to be needed
Seek corresponding different script template.
Wherein, preset script template library may include at least one script template, and every kind of script template can correspond to one kind
Data acquire demand, so as to acquire demand according to different data, can choose data in preset script template library
Script template corresponding to acquisition demand avoids and acquires demand for different data, needs to rewrite different data
The case where acquiring script can be improved wound building data acquisition demand and correspond to the efficiency of task, and then improve data acquisition node
Collecting efficiency.Optionally, every kind of preset script template can be adopted for user according to every kind of data in preset script template library
Collection demand programs to obtain, and is uploaded to the template of data collection system.
In addition, acquiring demand according to different data, configuration selects corresponding script mould in preset script template library
The reuse and unified management of script template may be implemented in plate, can be further improved the efficiency of creation task, and then improve data
The working efficiency of acquisition node, the collecting webpage data script style for avoiding different user from being write is different, cannot achieve to net
The reuse and management of page data acquisition script.
Optionally, the information of demand is acquired according to data and preset script template, wound building data acquisition demand correspond to
Task after, can be using the opening and closing of control task manually or by the way of timing, the disclosure do not open task
The mode for opening and closing is defined, and according to actual application scenarios, can take corresponding mode.Wherein, after task unlatching,
It can operation resource and multiple acquisition nodes according to above-mentioned S103 step, by preset scheduling engine according to required by task
Operation resource status, destination node is determined from multiple acquisition nodes, particular content can refer to the contents of the section, and the disclosure exists
This is just repeated no more.
Fig. 4 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed
Shown in 4, the operation resource of above-mentioned acquisition required by task and the operation resource status of multiple acquisition nodes, comprising:
S301, the information according to task determine the operation resource of required by task from preset tasking learning resources bank.
Tasking learning resources bank includes: that at least one task corresponds to the information of required operation resource in class where task.
Wherein, preset tasking learning resources bank may include the operation resource of at least one required by task in every generic task
Information therefore can be by preset tasking learning resources bank for acquiring demand being created of the task according to data
Operation resource needed for determining its operation.It is alternatively possible to be carried out more using the preset update cycle to tasking learning resources bank
Newly, so that tasking learning resources bank may include the information of operation resource needed for multiple tasks, and then according to multiple tasks
Information can determine the operation resource of required by task from preset tasking learning resources bank.
For example, when operation resource includes CPU, memory and network interface card physical resource, Network status resource, runing time resource,
If tasking learning resources bank includes: that physical resource needed for operation task M1 is A1, Network status resource is B1, runing time money
Source is C1;Physical resource needed for operation task M2 is A2, and Network status resource is B2, and runing time resource is C2, if then root
Information and preset script template according to data acquisition demand, when being created for task is M2, at this time according to the information of task,
Physical resource needed for task M2 can be learnt to operation to creep in preset tasking learning resources bank as A2, Network status provides
Source is B2, and runing time resource is C2.
S302, operation resource status is obtained from multiple acquisition nodes.
According to the real work situation of acquisition node, each acquisition node may in operation resource status in different time periods
Difference allows to obtain it is alternatively possible to obtain operation resource status from multiple acquisition nodes according to the preset time cycle
To the newest operation resource status of multiple acquisition nodes, wherein the predetermined period can be 1 minute, 5 minutes etc., and the disclosure is simultaneously
It is defined not to this.
Optionally, the above-mentioned information according to task determines the operation of required by task from preset tasking learning resources bank
Before resource, this method further include: each appoint according in the task place class in historical time section, run on multiple acquisition nodes
The operation resource status of business and each acquisition node, determines the operation resource of each required by task;According at least one task
The information of required operation resource, obtains tasking learning resources bank.
Wherein, by the way that in historical time section, running on multiple acquisition nodes for task and operation resource are compared,
It can determine the operation resource of each required by task;According to the information of the operation resource of each required by task, task can be obtained
Learning object repository, it can carried out accordingly more in information of the task learning object repository to the operation resource of each required by task
Newly.Optionally, S103 step is being executed, according to the operation resource status of the operation resource of required by task and multiple acquisition nodes,
When determining destination node from multiple acquisition nodes, the operation money for most meeting required by task can be selected from multiple acquisition nodes
The acquisition node in source is as destination node, to improve the data acquisition efficiency of task, realizes the efficient benefit of acquisition node resource
With.
For example it is assumed that the initial physical resource of acquisition node 1 and 2 is identical, operation has task M1 and M2 on acquisition node 1,
Operation has task M1, M2 and M3 on acquisition node 2, at this point, by the task and acquisition node 2 on operation acquisition node 1
Task, and the physical resources such as CPU, memory, network interface card that the M3 that goes out on missions is used can be estimated by contrast.Wherein, it is run in task M3
Period can also continuous monitoring network situation, Internet resources needed for obtaining task M3;When task M3 end of run, obtains and appoint
Time resource needed for business M3, required fortune when may thereby determine that using preset script template creation task M3 efficient operation
Row resource.
Fig. 5 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed
It is above-mentioned to include: from multiple acquisition nodes acquisition operation resource status shown in 5
S401, heartbeat request is sent to multiple acquisition nodes according to the preset time cycle.
It, can be according to default when needing to obtain the operation resource status of each acquisition node for multiple acquisition nodes
Time cycle, heartbeat request is sent to multiple acquisition nodes, to request to know the operation resource status of multiple acquisition nodes.Its
In, which can be 1 minute, 5 minutes etc., can be to the preset time cycle according to actual application scenarios
It is adjusted correspondingly, the disclosure is not defined the occurrence of preset time cycle herein.
S402, the heartbeat response that each acquisition node is sent is received, heartbeat response includes: the work shape of each acquisition node
The operation resource status of state and each acquisition node.
Wherein, when each acquisition node receives above-mentioned heartbeat request, then each acquisition node can be by respective node
Working condition and operation resource status are sent to task scheduling server, and task scheduling server can then receive each adopt at this time
Collect the heartbeat response that node is sent, accordingly, heartbeat response includes the working condition and each acquisition section of each acquisition node
The operation resource status of point, wherein the working condition of acquisition node may include normal operating conditions and abort situation.
The webpage data acquiring method, it is available to arrive the newest operation resource status of multiple acquisition nodes, so that in root
According to the operation resource of required by task and the operation resource status of multiple acquisition nodes, the newest fortune of multiple acquisition nodes can be based on
Row resource status determines destination node, to be the accurate destination node of task choosing.
Fig. 6 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed
Shown in 6, the above method may also include that
S501, destination node is received in carrying out data acquisition with operation resource transmitted by the preset time cycle
The execution state of state and task.
Wherein, it according to the operation resource status of the operation resource of required by task and multiple acquisition nodes, is saved from multiple acquisitions
After determining destination node in point, task scheduling server should also receive destination node in carrying out data acquisition with preset
The state of operation resource status and task that time cycle sends, so that server can be with the work of real-time awareness destination node
The operating status of state and task.Optionally, operation resource status may include the parameter of the physical resource of destination node, net
Parameter, parameter of runing time of network situation etc., the disclosure is defined not to this;The execution state of task may include appointing
Business carry out in, the abnormal end of task and task normally complete, may further include other shapes according to real-time applicable cases
Condition, the disclosure are defined not to this.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task
Corresponding data acquisition does not complete, but task has terminated;The above method further include: receive the operation for the task that destination node returns
Log, running log include: the operating parameter of task in the process of running.
Wherein, if task scheduling server receives the task abnormity situation of destination node transmission, task schedule service
Device should also receive the running log of the task of destination node return, and user is allowed to know task by the running log of return
Operating parameter in the process of running, convenient for user to the task of creeping for unusual condition occur carry out the checking of operating parameter, point
Analysis and modification etc. improve maintainable convenient for carrying out the maintenance in later period to task or preset script template.
Fig. 7 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed
Shown in 7, the above method further include:
S601, the ending message for receiving the task that destination node is sent.
Wherein, when destination node carries out data acquisition and normal termination according to task, task scheduling server should also be received
The ending message for the task that destination node is sent, for identifying the normal termination of task.
S602, cleaning and/or in-stockroom operation are carried out to the obtained data of data acquisition.
And data are carried out according to task for destination node and acquire obtained data, it can be with further progress data cleansing
And/or in-stockroom operation, wherein data cleansing refers to the process that data are examined and verified again, it is therefore intended that delete
Mistake existing for duplicate message, correction, and data consistency is provided, data loading refers to acquiring data into obtained data
It loads in preset database, certainly, the disclosure does not limit the type of the database of the mode of data cleansing and storage
It is fixed, it can be selected accordingly according to different applicable cases.
Fig. 8 is the flow diagram for another webpage data acquiring method that the embodiment of the present disclosure provides.Optionally, as schemed
Shown in 8, this method comprises:
S801, demand is acquired according to data, from preset script template library, determines the corresponding script of data acquisition demand
Template is preset script template.
S802, the information that demand is acquired according to data and preset script template, wound building data acquisition demand are corresponding
Task.
The operation resource status of S803, the operation resource for obtaining required by task and multiple acquisition nodes.
S804, according to the operation resource of required by task and the operation resource status of multiple acquisition nodes, saved from multiple acquisitions
Destination node is determined in point.
S805, by mission dispatching to destination node so that destination node carries out the acquisition of web data according to task.
S806, destination node is received in carrying out data acquisition with operation resource transmitted by the preset time cycle
The execution state of state and task.
Whether S807, the execution state for judging task include abortive information.
S808, receive destination node return task running log, running log include: task in the process of running
Operating parameter.
S809, the ending message for receiving the task that destination node is sent acquire obtained data to data and clean
And/or in-stockroom operation.
It should be noted that if the execution state of task includes abortive information, then S808 is executed, if not including,
Then execute S809.In addition, it should be noted that, the basic principle of the present embodiment and technical effect and the aforementioned corresponding side of generation
Method embodiment is identical, specifically refers to the corresponding contents in embodiment of the method, and the disclosure just repeats no more herein.
Fig. 9 is a kind of structural schematic diagram for collecting webpage data device that the embodiment of the present disclosure provides.The device is substantially former
Reason and the technical effect generated are identical as aforementioned corresponding embodiment of the method, to briefly describe, do not refer to part in the present embodiment,
It can refer to the corresponding contents in embodiment of the method.As shown in figure 9, the device includes creation module 210, obtains module 220, determines
Module 230 and acquisition module 240;Creation module 210, for according to data acquire demand information and preset script mould
Plate, the corresponding task of wound building data acquisition demand;Module 220 is obtained, for obtaining the operation resource of required by task and multiple adopting
Collect the operation resource status of node;Determining module 230, for according to the operation resource of required by task and the fortune of multiple acquisition nodes
Row resource status determines destination node from multiple acquisition nodes;Acquisition module 240, for by mission dispatching to destination node,
So that destination node carries out the acquisition of web data according to task.
Optionally, the data acquisition demand in above-mentioned apparatus may include: the information of target webpage, and/or, target data
Type.Optionally, the operation resource status in above-mentioned apparatus may include the combination of following one or more parameters: physical resource
Parameter, the parameter of Network status, the parameter of runing time.
Optionally, above-mentioned creation module 210 is specifically used for acquiring demand according to data, from preset script template library,
Determine that the corresponding script template of data acquisition demand is preset script template;Script template library includes: that at least one data are adopted
Script template corresponding to collection demand, different data acquisition demands correspond to different script templates.
Optionally, above-mentioned acquisition module 220, specifically for the information according to task, from preset tasking learning resources bank
In, determine the operation resource of required by task, tasking learning resources bank includes: needed at least one task is corresponding in class where task
Operation resource information;Operation resource status is obtained from multiple acquisition nodes.
Optionally, above-mentioned determining module 220 is specifically used for running on multiple acquisition nodes according in historical time section
The operation resource status of each task and each acquisition node in class where task determines the operation money of each required by task
Source;According to the information of the operation resource of at least one required by task, tasking learning resources bank is obtained.
Optionally, above-mentioned acquisition module 220 is specifically used for sending the heart to multiple acquisition nodes according to the preset time cycle
Jump request;The heartbeat response that each acquisition node is sent is received, heartbeat response includes: the working condition of each acquisition node, with
And the operation resource status of each acquisition node.
Figure 10 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides.Optionally, such as
Shown in Figure 10, above-mentioned apparatus further includes the first receiving module 250, for receive destination node carry out data acquisition in
The execution state of operation resource status and task transmitted by the preset time cycle.
Optionally, if the execution state of task includes: abortive information, abortive information is used to indicate task
Corresponding data acquisition does not complete, but task has terminated;Above-mentioned first receiving module 250 is also used to receive destination node return
Task running log, running log includes: the operating parameter of task in the process of running.
Figure 11 is the structural schematic diagram for another collecting webpage data device that the embodiment of the present disclosure provides.Optionally, such as
Shown in Figure 11, above-mentioned apparatus further includes the second receiving module 260 and operation module 270, and the second receiving module 260 is for receiving mesh
Mark the ending message for the task that node is sent;Operation module 270, for data acquire obtained data carry out cleaning and/
Or in-stockroom operation.
The method that above-mentioned apparatus is used to execute previous embodiment offer, it is similar that the realization principle and technical effect are similar, herein not
It repeats again.
The above module can be arranged to implement one or more integrated circuits of above method, such as: one
Or multiple specific integrated circuits (Application Specific Integrated Circuit, abbreviation ASIC), or, one
Or multi-microprocessor (Digital Singnal Processor, abbreviation DSP), or, one or more field programmable gate
Array (Field Programmable Gate Array, abbreviation FPGA) etc..For another example, when some above module passes through processing elements
When the form of part scheduler program code is realized, which can be general processor, such as central processing unit (Central
Processing Unit, abbreviation CPU) or it is other can be with the processor of caller code.For another example, these modules can integrate
Together, it is realized in the form of system on chip (system-on-a-chip, abbreviation SOC).
Figure 12 is a kind of structural schematic diagram for collecting webpage data equipment that the embodiment of the present disclosure provides.As shown in figure 12,
The data acquisition equipment may include processor 310, storage medium 320 and bus 330, and storage medium 320 is stored with processor
310 executable machine readable instructions, when data acquisition equipment operation, by total between processor 310 and storage medium 320
Line 330 communicates, and processor 310 executes machine readable instructions, and above method embodiment, realization principle and technology effect may be implemented
Seemingly, details are not described herein for fruit.
Data acquisition equipment shown in the Figure 12 can be the scheduling section in above-mentioned collecting webpage data system shown in FIG. 1
Point can be server or other computer equipments with dispatch deal function.
Optionally, the disclosure also provides a storage medium, is stored thereon with computer program, and the computer program is processed
When device reads and runs, above method embodiment may be implemented.
In several embodiments provided by the disclosure, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only
Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit
Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the disclosure can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this public affairs
Open the part steps of each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory
(English: Read-Only Memory, abbreviation: ROM), random access memory (English: Random Access Memory, letter
Claim: RAM), the various media that can store program code such as magnetic or disk.
It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one
A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to
Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting
Standby intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in the process, method, article or apparatus that includes the element.
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field
For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair
Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.It should also be noted that similar label and letter exist
Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and explained.The foregoing is merely preferred embodiment of the present disclosure, it is not limited to this
Open, for those skilled in the art, the disclosure can have various modifications and variations.All spirit and original in the disclosure
Within then, any modification, equivalent replacement, improvement and so on be should be included within the protection scope of the disclosure.
Claims (10)
1. a kind of webpage data acquiring method, which is characterized in that the described method includes:
According to the information of data acquisition demand and preset script template, the corresponding task of the data acquisition demand is created;
Obtain the operation resource of the required by task and the operation resource status of multiple acquisition nodes;
According to the operation resource status of the operation resource of the required by task and multiple acquisition nodes, from multiple acquisitions
Destination node is determined in node;
The mission dispatching is given to the destination node, so that the destination node carries out web data according to the task
Acquisition.
2. the method according to claim 1, wherein described acquire the information of demand and preset according to data
Script template, before creating the corresponding task of data acquisition demand, comprising:
Demand is acquired according to the data, from preset script template library, determines the corresponding script of the data acquisition demand
Template is the preset script template;The script template library includes: script corresponding at least one data acquisition demand
Template, different data acquisition demands correspond to different script templates.
3. the method according to claim 1, wherein the operation resource for obtaining the required by task and multiple
The operation resource status of acquisition node, comprising:
According to the information of the task, from preset tasking learning resources bank, the operation resource of the required by task, institute are determined
State the information that tasking learning resources bank includes: the operation resource of at least one required by task in class where the task;
The operation resource status is obtained from multiple acquisition nodes.
4. according to the method described in claim 3, it is characterized in that, described obtain the operation money from multiple acquisition nodes
Source state, comprising:
Heartbeat request is sent to multiple acquisition nodes according to the preset time cycle;
The heartbeat response that each acquisition node is sent is received, the heartbeat response includes: the work of each acquisition node
Make the operation resource status of state and each acquisition node.
5. the method according to claim 1, wherein the method also includes:
The destination node is received in carrying out data acquisition with operation resource status transmitted by the preset time cycle
And the execution state of the task.
6. according to the method described in claim 5, it is characterized in that, if the execution state of the task includes: abortive
Information, the abortive information is used to indicate the corresponding data acquisition of the task and does not complete, but the task has terminated;
The method also includes:
The running log for the task that the destination node returns is received, the running log includes: that the task is being run
Operating parameter in the process.
7. any method in -6 according to claim 1, which is characterized in that the method also includes:
Receive the ending message for the task that the destination node is sent;
Obtained data are acquired to the data and carry out cleaning and/or in-stockroom operation.
8. a kind of collecting webpage data device, which is characterized in that described device includes creation module, obtains module, determining module
And acquisition module;
The creation module, for according to data acquire demand information and preset script template, create the data and adopt
The corresponding task of collection demand;
The acquisition module, for obtaining the operation resource of the required by task and the operation resource status of multiple acquisition nodes;
The determining module, for according to the operation resource of the required by task and the operation resource shape of the multiple acquisition node
State determines destination node from the multiple acquisition node;
The acquisition module, for giving the mission dispatching to the destination node, so that the destination node is according to
The acquisition of task progress web data.
9. a kind of collecting webpage data equipment characterized by comprising processor, storage medium and bus, the storage medium
The executable machine readable instructions of the processor are stored with, when collecting webpage data equipment operation, the processor
By bus communication between the storage medium, the processor executes the machine readable instructions, to execute as right is wanted
The step of seeking 1 to 7 any described webpage data acquiring method.
10. a kind of storage medium, which is characterized in that be stored with computer program, the computer program on the storage medium
The step of webpage data acquiring method as described in claim 1 to 7 is any is executed when being run by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910627107.1A CN110442766A (en) | 2019-07-11 | 2019-07-11 | Webpage data acquiring method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910627107.1A CN110442766A (en) | 2019-07-11 | 2019-07-11 | Webpage data acquiring method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110442766A true CN110442766A (en) | 2019-11-12 |
Family
ID=68430315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910627107.1A Pending CN110442766A (en) | 2019-07-11 | 2019-07-11 | Webpage data acquiring method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442766A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111147325A (en) * | 2019-12-18 | 2020-05-12 | 深圳市任子行科技开发有限公司 | Distributed network information acquisition system, method and computer readable storage medium |
CN111708931A (en) * | 2020-06-06 | 2020-09-25 | 谢国柱 | Big data acquisition method based on mobile internet and artificial intelligence cloud service platform |
CN114793194A (en) * | 2022-03-09 | 2022-07-26 | 中国邮政储蓄银行股份有限公司 | Service data processing method and device and computer readable storage medium |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090292691A1 (en) * | 2008-05-21 | 2009-11-26 | Sungkyunkwan University Foundation For Corporate Collaboration | System and Method for Building Multi-Concept Network Based on User's Web Usage Data |
CN101715004A (en) * | 2009-11-12 | 2010-05-26 | 中国科学院计算技术研究所 | Internet video-oriented distributed acquisition method and system |
CN103745322A (en) * | 2014-01-22 | 2014-04-23 | 云南电力调度控制中心 | Province-city secondary system integrated comprehensive monitoring and process management system in power dispatching and implementation method for system |
CN104735138A (en) * | 2015-03-09 | 2015-06-24 | 中国科学院计算技术研究所 | Distributed acquisition method and system oriented to user generated content |
CN105302785A (en) * | 2015-09-24 | 2016-02-03 | 金蝶软件(中国)有限公司 | Data collection method and system |
US20170024472A1 (en) * | 2015-07-23 | 2017-01-26 | Green Prestige Pte. Ltd. | Information retrieval method utilizing webpage visual and language features and system using thereof |
CN106372082A (en) * | 2015-07-22 | 2017-02-01 | 克拉玛依红有软件有限责任公司 | Single-file multi-form data automatic storage method and system |
CN106570011A (en) * | 2015-10-09 | 2017-04-19 | 北京京东尚科信息技术有限公司 | Distributed crawler URL seed distribution method, dispatching node, and grabbing node |
CN107562541A (en) * | 2017-09-05 | 2018-01-09 | 广东科杰通信息科技有限公司 | A kind of distributed reptile method of load balancing, crawler system |
CN107918674A (en) * | 2017-12-12 | 2018-04-17 | 携程旅游网络技术(上海)有限公司 | Acquisition method and its system, storage medium, the electronic equipment of web data |
CN108304498A (en) * | 2018-01-12 | 2018-07-20 | 深圳壹账通智能科技有限公司 | Webpage data acquiring method, device, computer equipment and storage medium |
CN108551404A (en) * | 2018-04-20 | 2018-09-18 | 北京百度网讯科技有限公司 | Method, apparatus, storage medium and the terminal device of client-side information analysis |
CN108768791A (en) * | 2018-07-04 | 2018-11-06 | 山东汇贸电子口岸有限公司 | A kind of information collection configuration management system and method |
CN109523446A (en) * | 2018-10-19 | 2019-03-26 | 北京北大软件工程股份有限公司 | A kind of big data processing analysis system towards price field |
CN109740038A (en) * | 2019-01-02 | 2019-05-10 | 安徽芃睿科技有限公司 | Network data distributed parallel computing environment and method |
CN109814992A (en) * | 2018-12-29 | 2019-05-28 | 中国科学院计算技术研究所 | Distributed dynamic dispatching method and system for the acquisition of large scale network data |
-
2019
- 2019-07-11 CN CN201910627107.1A patent/CN110442766A/en active Pending
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090292691A1 (en) * | 2008-05-21 | 2009-11-26 | Sungkyunkwan University Foundation For Corporate Collaboration | System and Method for Building Multi-Concept Network Based on User's Web Usage Data |
CN101715004A (en) * | 2009-11-12 | 2010-05-26 | 中国科学院计算技术研究所 | Internet video-oriented distributed acquisition method and system |
CN103745322A (en) * | 2014-01-22 | 2014-04-23 | 云南电力调度控制中心 | Province-city secondary system integrated comprehensive monitoring and process management system in power dispatching and implementation method for system |
CN104735138A (en) * | 2015-03-09 | 2015-06-24 | 中国科学院计算技术研究所 | Distributed acquisition method and system oriented to user generated content |
CN106372082A (en) * | 2015-07-22 | 2017-02-01 | 克拉玛依红有软件有限责任公司 | Single-file multi-form data automatic storage method and system |
US20170024472A1 (en) * | 2015-07-23 | 2017-01-26 | Green Prestige Pte. Ltd. | Information retrieval method utilizing webpage visual and language features and system using thereof |
CN105302785A (en) * | 2015-09-24 | 2016-02-03 | 金蝶软件(中国)有限公司 | Data collection method and system |
CN106570011A (en) * | 2015-10-09 | 2017-04-19 | 北京京东尚科信息技术有限公司 | Distributed crawler URL seed distribution method, dispatching node, and grabbing node |
CN107562541A (en) * | 2017-09-05 | 2018-01-09 | 广东科杰通信息科技有限公司 | A kind of distributed reptile method of load balancing, crawler system |
CN107918674A (en) * | 2017-12-12 | 2018-04-17 | 携程旅游网络技术(上海)有限公司 | Acquisition method and its system, storage medium, the electronic equipment of web data |
CN108304498A (en) * | 2018-01-12 | 2018-07-20 | 深圳壹账通智能科技有限公司 | Webpage data acquiring method, device, computer equipment and storage medium |
CN108551404A (en) * | 2018-04-20 | 2018-09-18 | 北京百度网讯科技有限公司 | Method, apparatus, storage medium and the terminal device of client-side information analysis |
CN108768791A (en) * | 2018-07-04 | 2018-11-06 | 山东汇贸电子口岸有限公司 | A kind of information collection configuration management system and method |
CN109523446A (en) * | 2018-10-19 | 2019-03-26 | 北京北大软件工程股份有限公司 | A kind of big data processing analysis system towards price field |
CN109814992A (en) * | 2018-12-29 | 2019-05-28 | 中国科学院计算技术研究所 | Distributed dynamic dispatching method and system for the acquisition of large scale network data |
CN109740038A (en) * | 2019-01-02 | 2019-05-10 | 安徽芃睿科技有限公司 | Network data distributed parallel computing environment and method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111147325A (en) * | 2019-12-18 | 2020-05-12 | 深圳市任子行科技开发有限公司 | Distributed network information acquisition system, method and computer readable storage medium |
CN111708931A (en) * | 2020-06-06 | 2020-09-25 | 谢国柱 | Big data acquisition method based on mobile internet and artificial intelligence cloud service platform |
CN111708931B (en) * | 2020-06-06 | 2020-12-25 | 湖南伟业动物营养集团股份有限公司 | Big data acquisition method based on mobile internet and artificial intelligence cloud service platform |
CN114793194A (en) * | 2022-03-09 | 2022-07-26 | 中国邮政储蓄银行股份有限公司 | Service data processing method and device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104067257B (en) | Automate event management system, management event method and event management system | |
Lind | Iterative software engineering for multiagent systems: the MASSIVE method | |
US20200272911A1 (en) | A cognitive automation engineering system | |
CN108353090A (en) | Edge intelligence platform and internet of things sensors streaming system | |
CN110442766A (en) | Webpage data acquiring method, device, equipment and storage medium | |
CN108243012B (en) | Charging application processing system, method and device in OCS (online charging System) | |
Cui et al. | Scenario analysis of web service composition based on multi-criteria mathematical goal programming | |
Marzolla | Simulation-based performance modeling of UML software architectures. | |
Qian et al. | A workflow-aided Internet of things paradigm with intelligent edge computing | |
Leitao et al. | A survey on factors that impact industrial agent acceptance | |
Wagner | Tutorial: Information and process modeling for simulation | |
CN108959488A (en) | Safeguard the method and device of Question-Answering Model | |
EP4024761A1 (en) | Communication method and apparatus for multiple management domains | |
Hopp et al. | A diagnostic tree for improving production line performance | |
Virani et al. | Service composition based on multi agent in cloud environment | |
US11816548B2 (en) | Distributed learning using ensemble-based fusion | |
Nguyen et al. | Real-time optimisation for industrial internet of things (IIoT): Overview, challenges and opportunities | |
Wolf | Succeedings of the second international software architecture workshop (isaw-2) | |
Cicirelli et al. | Using time stream Petri nets for workflow modelling analysis and enactment | |
Tuli et al. | Optimizing the Performance of Fog Computing Environments Using AI and Co-Simulation | |
CN109118151B (en) | Work order transaction processing method and work order transaction processing system | |
Khelifati et al. | A multi-agent approach for scheduling jobs and maintenance operations in the flowshop sequencing problem | |
Maamar et al. | How to Make Business Processes" Socialize"? | |
Garcés et al. | Towards an architectural patterns language for systems-of-systems | |
Tapia et al. | Organizations of agents in information fusion environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191112 |