CN105447097A - Data acquisition method and system - Google Patents

Data acquisition method and system Download PDF

Info

Publication number
CN105447097A
CN105447097A CN201510765008.1A CN201510765008A CN105447097A CN 105447097 A CN105447097 A CN 105447097A CN 201510765008 A CN201510765008 A CN 201510765008A CN 105447097 A CN105447097 A CN 105447097A
Authority
CN
China
Prior art keywords
node
acquisition
host node
tasks
cluster database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510765008.1A
Other languages
Chinese (zh)
Inventor
龚建新
王周松
郑平贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing VRV Software Corp Ltd
Original Assignee
Beijing VRV Software Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing VRV Software Corp Ltd filed Critical Beijing VRV Software Corp Ltd
Priority to CN201510765008.1A priority Critical patent/CN105447097A/en
Publication of CN105447097A publication Critical patent/CN105447097A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a data acquisition method and system. The method comprises: automatically registering to a zookeeper server when multiple nodes are started; the zookeeper server determining a master node and acquisition nodes from the multiple nodes; the master node acquiring acquisition tasks and storing the acquired acquisition tasks into a cluster database; the master node distributing the acquisition tasks that have been stored in the cluster database to the master node and/or the multiple acquisition nodes so as to enable the master node and/or the acquisition nodes to read the corresponding acquisition tasks in the cluster database; and when a fault occurs to one acquisition node, the master node taking the acquisition task of the acquisition node back, and distributing the acquisition task to the master node and/or the other acquisition nodes except the acquisition node. The data acquisition method and system provided by the present invention achieves the monitoring and management of a valid state of a crawler cluster, improves the efficiency of data acquisition, and has a relatively good fault-tolerance capability.

Description

Collecting method and system
Technical field
The present invention relates to data acquisition technology field, particularly relate to a kind of collecting method and system.
Background technology
Existing web crawlers of increasing income is a lot, such as Heritrix, and be that developed by java, a to increase income web crawlers, user can use it to capture the resource wanted from network.But it is the reptile of single example, can not cooperate between reptile.Because each reptile carries out work separately, therefore when hardware and thrashing, recovery capability is very poor.
In addition, Nutch is also the crawler system of increasing income but framework is too complicated, and flexibility ratio is not high.Effectively can not carry out management and supervision to collection.
Summary of the invention
For the defect of prior art, the invention provides a kind of collecting method and system, achieve the effective status monitor and managment to reptile cluster, improve the efficiency of data acquisition, and there is stronger fault-tolerant ability.
First aspect, the invention provides a kind of collecting method, and described method comprises:
Multiple node is automatically registered on zookeeper server when starting;
Described zookeeper server determines host node and acquisition node from described multiple node;
Described host node obtains acquisition tasks, and by the acquisition tasks of acquisition stored in Cluster Database;
Acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node by described host node, reads corresponding acquisition tasks in described Cluster Database to make described host node and/or each acquisition node;
When a certain acquisition node breaks down, described host node regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node.
Wherein, described host node obtains acquisition tasks, and by the acquisition tasks of acquisition stored in Cluster Database, comprising:
Described host node obtains acquisition tasks, and by the acquisition tasks that obtains according to task identification respectively stored in each database of Cluster Database.
Wherein, the acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node by described host node, to make described host node and/or each acquisition node read corresponding acquisition tasks in described Cluster Database, comprising:
Described host node will distribute to host node and/or multiple acquisition node stored in the acquisition tasks in Cluster Database according to task identification, and allocation result is recorded in zookeeper server, to make host node and/or each acquisition node obtain the task identification of corresponding node separately by zookeeper server, and read the acquisition tasks in the associated databases of described Cluster Database according to task identification.
Wherein, described method also comprises:
When a certain acquisition node breaks down, described host node sends the warning message broken down about this acquisition node to administrative center.
Wherein, described method also comprises:
Described zookeeper server is determined from node from multiple node, described from node be used for after described host node breaks down, take over the work of described host node.
Wherein, described method also comprises:
The described warning message that transmission is broken down about host node from node to administrative center.
Wherein, first injection volume is defined as host node to the node of zookeeper server by described zookeeper server; The node second being registered to zookeeper server is defined as from node; The node of the N number of zookeeper of being registered to server is defined as acquisition node, and N > 2, N is positive integer.
Second aspect, present invention also offers a kind of data acquisition system (DAS), comprising: zookeeper server, Cluster Database and multiple node; Described multiple node is automatically registered on described zookeeper server when starting;
Described zookeeper server, for determining host node and acquisition node from described multiple node;
Described host node, for obtaining acquisition tasks, and by the acquisition tasks of acquisition stored in described Cluster Database;
Described host node, also for the acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node, to make to perform corresponding acquisition tasks in described host node and/or each acquisition node to described Cluster Database;
Described host node, also for when a certain acquisition node breaks down, regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node.
Wherein, described host node specifically for: obtain acquisition tasks, and by obtain acquisition tasks according to task identification respectively stored in each database of Cluster Database.
Wherein, described host node also specifically for: will distribute to host node and/or multiple acquisition node stored in the acquisition tasks in Cluster Database according to task identification, and allocation result be recorded in zookeeper server;
Described host node also specifically for: send request signal to described zookeeper server, with the task identification that acquisition request is corresponding with described host node, and read acquisition tasks in the associated databases of described Cluster Database according to the task identification obtained;
And/or,
Described acquisition node specifically for: send request signal to described zookeeper server, with the task identification that acquisition request is corresponding with described acquisition node, and read acquisition tasks in the associated databases of described Cluster Database according to the task identification obtained.
As shown from the above technical solution, collecting method provided by the invention, adopts trunking mode by the coordinated management of zookeeper server and allocating task, achieves effective status monitor and managment to reptile cluster, achieves the scientific dispatch of reptile cluster, enhances the dirigibility of data acquisition, improves the efficiency of data acquisition.And collecting method of the present invention has stronger fault tolerant mechanism.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these figure.
Fig. 1 is the process flow diagram of the collecting method that first embodiment of the invention provides;
Fig. 2 is the principle of work schematic diagram of collecting method provided by the invention;
Fig. 3 is the structural representation of the data acquisition system (DAS) that second embodiment of the invention provides.
Embodiment
Below in conjunction with the accompanying drawing in disclosure embodiment, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Fig. 1 shows the process flow diagram of the collecting method that first embodiment of the invention provides.As shown in Figure 1, the collecting method that the present embodiment provides comprises the steps:
Step 101: be automatically registered on zookeeper server when multiple node starts.
In this step, described multiple node is all deployed with acquisition applications, described multiple node is automatically registered in zookeeper server when startup.Node described in the present embodiment can for being deployed with the computing machine of acquisition applications.
Step 102: described zookeeper server determines host node and acquisition node from described multiple node.
In this step, described zookeeper server determines host node and acquisition node from multiple node, and such as zookeeper server can determine host node and acquisition node according to Node registry to the sequencing of zookeeper server automatically.
Step 103: described host node obtains acquisition tasks, and by the acquisition tasks of acquisition stored in Cluster Database.
In this step, preferably, described host node obtains acquisition tasks, and by the acquisition tasks of acquisition stored in Cluster Database, comprising:
Described host node obtains acquisition tasks, and by the acquisition tasks that obtains according to task identification respectively stored in each database of Cluster Database.
In this step, suppose that described Cluster Database is redis, then host node be responsible for read task library carry out seed initialization and according to taskid as a queue key stored in redis cluster.Such process has two advantages: 1. stored in redis cluster according to task division can distribute in the cluster even, be unlikely to acquisition tasks and focus on a machine build-up of pressure; 2. according to task division, task can be assigned to acquisition node, ensure that node tasks is independent.
Step 104: the acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node by described host node, reads corresponding acquisition tasks in described Cluster Database to make described host node and/or each acquisition node; When a certain acquisition node breaks down, described host node regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node.
In this step, preferably, acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node by described host node, to make described host node and/or each acquisition node read corresponding acquisition tasks in described Cluster Database, comprising:
Described host node will distribute to host node and/or multiple acquisition node stored in the acquisition tasks in Cluster Database according to task identification, and allocation result is recorded in zookeeper server, to make host node and/or each acquisition node obtain the task identification of corresponding node separately by zookeeper server, and read the acquisition tasks in the associated databases of described Cluster Database according to task identification.
Describe known from above, the important task of host node another one is exactly be assigned to dividing equally stored in the task in reids in acquisition node, and is recorded in the status file in zookeeper.Such acquisition node just directly can read in zookeeper the tasks distributing to oneself and go to obtain task to be collected in redis.
Particularly, the data structure that zookeeper preserves is as follows, and wherein the json data of the taskid of oneself are distributed in tasks preservation, wherein spider1, spider2 ..., spiderN represents host node and/or acquisition node.
/spider/spider1/tasks
/spider2/tasks
/spider3/tasks
/spiderN/tasks
/status.properties
Fig. 2 shows the fundamental diagram of the collecting method described in the present embodiment, see Fig. 2, host node (the collection master namely in Fig. 2) obtains acquisition tasks from the DB of acquisition tasks storehouse, then the acquisition tasks of acquisition is deposited in each database of Cluster Database according to task ID, as in 6 databases of Redis in Fig. 2, then host node the acquisition tasks stored in Cluster Database is distributed to described self, from node (collection namely figure from) and multiple acquisition node (collection namely figure), and task matching situation is recorded in zookeeper server, such as task 1, task 2 leaves in database 1 and database 2 respectively, and task 1, task 2 distributes to host node.And for example, task 3, task 4 leave in database 3 and database 4 respectively, and task 3, task 4 are distributed to from node; Task 5 leaves in database 5, and task 5 distributes to acquisition node 3; Task 6 leaves in database 6, and task 6 distributes to acquisition node N.Like this, host node, just can obtain the task ID that belong to own node by zookeeper from node and each acquisition node, and read the acquisition tasks in associated databases according to task ID.
In the present embodiment, when a certain acquisition node breaks down, described host node regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node, the acquisition tasks of this acquisition node is completed by host node and/or other acquisition nodes except this acquisition node.
In other embodiments of the invention, when a certain acquisition node breaks down, described host node can also send to administrative center the warning message broken down about this acquisition node, so that administrative center checks in time or changes this acquisition node.
In other embodiments of the invention, in above-mentioned steps 102, zookeeper server except determining except host node and acquisition node from described multiple node, can also determine from node from multiple node, described from node be used for after described host node breaks down, take over the work of described host node.In addition, after host node breaks down, describedly the warning message broken down about host node can also be sent, so that administrative center checks in time or changes host node to administrative center from node.Wherein warning message can be sent by the mode of mail or SMS, and the administrative center in addition in the present embodiment can be keeper also can be computing machine or other smart machines.
Preferably, first injection volume can be defined as host node to the node of zookeeper server by described zookeeper server; The node second being registered to zookeeper server is defined as from node; The node of the N number of zookeeper of being registered to server is defined as acquisition node, and N > 2, N is positive integer.
In the present embodiment, described host node and can determine whether to need to distribute acquisition tasks by configuration from node.Usually, described host node and from node acquiescence bear acquisition tasks.
Known by foregoing description, collecting method of the present invention has good fault tolerant mechanism, such as, after certain acquisition node down falls, after host node monitors, the acquisition node task of down is reclaimed, and distribute to the acquisition node that other are in active, also can send warning message to keeper.
In order to increase fault-tolerant ability further, the present invention is also provided with from node, when after host node down, host node can be replaced to perform management work from node, and the tasks distributing to host node is reclaimed and distributes to the acquisition node that other are in active, warning message can also be sent to keeper.
Collecting method of the present invention, adopts trunking mode by the coordinated management of zookeeper server and allocating task, achieves effective status monitor and managment to reptile cluster, achieves the scientific dispatch of reptile cluster, enhances the dirigibility of data acquisition, improves the efficiency of data acquisition.And collecting method of the present invention has stronger fault tolerant mechanism.
Second embodiment of the invention additionally provides a kind of data acquisition system (DAS), and see Fig. 3, this system comprises: zookeeper server, Cluster Database and multiple node; Described multiple node is automatically registered on described zookeeper server when starting;
Described zookeeper server, for determining host node and acquisition node from described multiple node;
Described host node, for obtaining acquisition tasks, and by the acquisition tasks of acquisition stored in described Cluster Database;
Described host node, also for the acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node, to make to perform corresponding acquisition tasks in described host node and/or each acquisition node to described Cluster Database;
Described host node, also for when a certain acquisition node breaks down, regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node.
Further, described host node specifically for: obtain acquisition tasks, and by obtain acquisition tasks according to task identification respectively stored in each database of Cluster Database.
Further, described host node also specifically for: will distribute to host node and/or multiple acquisition node stored in the acquisition tasks in Cluster Database according to task identification, and allocation result be recorded in zookeeper server;
Described host node also specifically for: send request signal to described zookeeper server, with the task identification that acquisition request is corresponding with described host node, and read acquisition tasks in the associated databases of described Cluster Database according to the task identification obtained;
And/or,
Described acquisition node specifically for: send request signal to described zookeeper server, with the task identification that acquisition request is corresponding with described acquisition node, and read acquisition tasks in the associated databases of described Cluster Database according to the task identification obtained.
Further, described zookeeper server, also for determining from node from described multiple node; Described from node be used for after described host node breaks down, take over the work of described host node.
Further, described zookeeper server, specifically for being defined as host node by first injection volume to the node of zookeeper server; The node second being registered to zookeeper server is defined as from node; The node of the N number of zookeeper of being registered to server is defined as acquisition node, and N > 2, N is positive integer.
Data acquisition system (DAS) described in the present embodiment may be used for performing collecting method described in above-described embodiment, its principle and technique effect similar, no longer describe in detail herein.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in the storage medium of embodied on computer readable, this program, when performing, performs the step comprising said method embodiment.
In instructions of the present invention, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that; It still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1. a collecting method, is characterized in that, described method comprises:
Multiple node is automatically registered on zookeeper server when starting;
Described zookeeper server determines host node and acquisition node from described multiple node;
Described host node obtains acquisition tasks, and by the acquisition tasks of acquisition stored in Cluster Database;
Acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node by described host node, reads corresponding acquisition tasks in described Cluster Database to make described host node and/or each acquisition node;
When a certain acquisition node breaks down, described host node regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node.
2. method according to claim 1, is characterized in that, described host node obtains acquisition tasks, and by the acquisition tasks of acquisition stored in Cluster Database, comprising:
Described host node obtains acquisition tasks, and by the acquisition tasks that obtains according to task identification respectively stored in each database of Cluster Database.
3. method according to claim 2, it is characterized in that, acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node by described host node, to make described host node and/or each acquisition node read corresponding acquisition tasks in described Cluster Database, comprising:
Described host node will distribute to host node and/or multiple acquisition node stored in the acquisition tasks in Cluster Database according to task identification, and allocation result is recorded in zookeeper server, to make host node and/or each acquisition node obtain the task identification of corresponding node separately by zookeeper server, and read the acquisition tasks in the associated databases of described Cluster Database according to task identification.
4. method according to claim 1, is characterized in that, described method also comprises:
When a certain acquisition node breaks down, described host node sends the warning message broken down about this acquisition node to administrative center.
5., according to the arbitrary described method of Claims 1 to 4, it is characterized in that, described method also comprises:
Described zookeeper server is determined from node from multiple node, described from node be used for after described host node breaks down, take over the work of described host node.
6. method according to claim 5, is characterized in that, described method also comprises:
The described warning message that transmission is broken down about host node from node to administrative center.
7. method according to claim 5, is characterized in that, first injection volume is defined as host node to the node of zookeeper server by described zookeeper server; The node second being registered to zookeeper server is defined as from node; The node of the N number of zookeeper of being registered to server is defined as acquisition node, and N > 2, N is positive integer.
8. a data acquisition system (DAS), is characterized in that, comprising: zookeeper server, Cluster Database and multiple node; Described multiple node is automatically registered on described zookeeper server when starting;
Described zookeeper server, for determining host node and acquisition node from described multiple node;
Described host node, for obtaining acquisition tasks, and by the acquisition tasks of acquisition stored in described Cluster Database;
Described host node, also for the acquisition tasks stored in Cluster Database is distributed to described host node and/or multiple acquisition node, to make to perform corresponding acquisition tasks in described host node and/or each acquisition node to described Cluster Database;
Described host node, also for when a certain acquisition node breaks down, regains the acquisition tasks of this acquisition node, and this acquisition tasks is distributed to described host node and/or other acquisition nodes except this acquisition node.
9. system according to claim 8, is characterized in that, described host node specifically for: obtain acquisition tasks, and by obtain acquisition tasks according to task identification respectively stored in each database of Cluster Database.
10. system according to claim 9, it is characterized in that, described host node also specifically for: will distribute to host node and/or multiple acquisition node stored in the acquisition tasks in Cluster Database according to task identification, and allocation result be recorded in zookeeper server;
Described host node also specifically for: send request signal to described zookeeper server, with the task identification that acquisition request is corresponding with described host node, and read acquisition tasks in the associated databases of described Cluster Database according to the task identification obtained;
And/or,
Described acquisition node specifically for: send request signal to described zookeeper server, with the task identification that acquisition request is corresponding with described acquisition node, and read acquisition tasks in the associated databases of described Cluster Database according to the task identification obtained.
CN201510765008.1A 2015-11-10 2015-11-10 Data acquisition method and system Pending CN105447097A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510765008.1A CN105447097A (en) 2015-11-10 2015-11-10 Data acquisition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510765008.1A CN105447097A (en) 2015-11-10 2015-11-10 Data acquisition method and system

Publications (1)

Publication Number Publication Date
CN105447097A true CN105447097A (en) 2016-03-30

Family

ID=55557275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510765008.1A Pending CN105447097A (en) 2015-11-10 2015-11-10 Data acquisition method and system

Country Status (1)

Country Link
CN (1) CN105447097A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126346A (en) * 2016-07-05 2016-11-16 东北大学 A kind of large-scale distributed data collecting system and method
CN106534259A (en) * 2016-09-30 2017-03-22 山东大学 Web data collection method based on Docker, Web data collection Web server based on Docker and Web data collection system based on Docker
CN106528769A (en) * 2016-11-04 2017-03-22 乐视控股(北京)有限公司 Data acquisition method and apparatus
CN107193960A (en) * 2017-05-24 2017-09-22 南京大学 A kind of distributed reptile system and periodicity increment grasping means
CN107341051A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 Cluster task coordination approach, system and device
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN107800789A (en) * 2017-10-24 2018-03-13 麦格创科技(深圳)有限公司 The distribution method and system of task manager in distributed reptile system
CN107800737A (en) * 2016-09-05 2018-03-13 中国移动通信有限公司研究院 The determination method, apparatus and server cluster of host node in a kind of server cluster
CN107896175A (en) * 2017-11-30 2018-04-10 北京小度信息科技有限公司 Collecting method and device
WO2018099067A1 (en) * 2016-11-29 2018-06-07 上海壹账通金融科技有限公司 Distributed task scheduling method and system
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing
CN108769115A (en) * 2018-04-19 2018-11-06 中国科学院计算技术研究所 Distributed RSS data acquisition method and system
CN109298937A (en) * 2018-09-19 2019-02-01 中国联合网络通信集团有限公司 Document analysis method and the network equipment
CN109542595A (en) * 2017-09-21 2019-03-29 阿里巴巴集团控股有限公司 A kind of collecting method, device and system
CN109587138A (en) * 2018-12-06 2019-04-05 中电工业互联网有限公司 A kind of fault-tolerant dynamic dispatching method of Internet of things system service node and server
CN109656690A (en) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 Scheduling system, method and storage medium
CN109688106A (en) * 2018-11-19 2019-04-26 中国科学院信息工程研究所 A kind of data collaborative acquisition method and system
CN109698785A (en) * 2017-10-24 2019-04-30 广东亿迅科技有限公司 A kind of the real-time messages method for pushing and device of distribution high concurrent
WO2019079967A1 (en) * 2017-10-24 2019-05-02 麦格创科技(深圳)有限公司 Method for allocating task manager in distributed crawler system and system
CN109977161A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 The monitoring system of presto cluster
CN111290854A (en) * 2020-01-20 2020-06-16 腾讯科技(深圳)有限公司 Task management method, device and system, computer storage medium and electronic equipment
CN111865899A (en) * 2020-06-02 2020-10-30 中国科学院信息工程研究所 Threat-driven cooperative acquisition method and device
CN112306720A (en) * 2020-11-23 2021-02-02 迈普通信技术股份有限公司 Service system cluster management method
CN112584421A (en) * 2020-12-01 2021-03-30 深圳力维智联技术有限公司 FSU management method and device and computer readable storage medium
CN113722263A (en) * 2021-09-06 2021-11-30 浪潮通用软件有限公司 Cluster data acquisition method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103092698A (en) * 2012-12-24 2013-05-08 中国科学院深圳先进技术研究院 System and method of cloud computing application automatic deployment
CN103227840A (en) * 2013-05-24 2013-07-31 上海和伍新材料科技有限公司 IOT (Internet of things)-oriented high-concurrency high-availability data acquisition system
CN104320433A (en) * 2014-09-28 2015-01-28 北京京东尚科信息技术有限公司 Data processing method and distributed data processing system
CN104391989A (en) * 2014-12-16 2015-03-04 浪潮电子信息产业股份有限公司 Distributed ETL all-in-one machine system
CN104866378A (en) * 2015-05-29 2015-08-26 北京京东尚科信息技术有限公司 System and method for coordinating execution tasks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103092698A (en) * 2012-12-24 2013-05-08 中国科学院深圳先进技术研究院 System and method of cloud computing application automatic deployment
CN103227840A (en) * 2013-05-24 2013-07-31 上海和伍新材料科技有限公司 IOT (Internet of things)-oriented high-concurrency high-availability data acquisition system
CN104320433A (en) * 2014-09-28 2015-01-28 北京京东尚科信息技术有限公司 Data processing method and distributed data processing system
CN104391989A (en) * 2014-12-16 2015-03-04 浪潮电子信息产业股份有限公司 Distributed ETL all-in-one machine system
CN104866378A (en) * 2015-05-29 2015-08-26 北京京东尚科信息技术有限公司 System and method for coordinating execution tasks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
任乐乐 等: "《一种改进的主从节点选举算法用于实现集群负载均衡》", 《中国计量学院学报》 *
李娜娜: "《云计算平台下社交网络数据获取技术研究》", 《中国优秀硕士论文全文数据库 信息科技辑》 *
郑圆杰: "《云计算中超大规模虚拟网络平台设计与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341051A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 Cluster task coordination approach, system and device
CN106126346B (en) * 2016-07-05 2019-02-26 东北大学 A kind of large-scale distributed data collection system and method
CN106126346A (en) * 2016-07-05 2016-11-16 东北大学 A kind of large-scale distributed data collecting system and method
CN107800737A (en) * 2016-09-05 2018-03-13 中国移动通信有限公司研究院 The determination method, apparatus and server cluster of host node in a kind of server cluster
CN106534259A (en) * 2016-09-30 2017-03-22 山东大学 Web data collection method based on Docker, Web data collection Web server based on Docker and Web data collection system based on Docker
CN106534259B (en) * 2016-09-30 2019-08-13 山东大学 Web data acquisition method, Web server and web data acquisition system based on Docker
CN106528769A (en) * 2016-11-04 2017-03-22 乐视控股(北京)有限公司 Data acquisition method and apparatus
WO2018099067A1 (en) * 2016-11-29 2018-06-07 上海壹账通金融科技有限公司 Distributed task scheduling method and system
CN107193960A (en) * 2017-05-24 2017-09-22 南京大学 A kind of distributed reptile system and periodicity increment grasping means
CN107193960B (en) * 2017-05-24 2020-11-10 南京大学 Distributed crawler system and periodic incremental grabbing method
CN107562541B (en) * 2017-09-05 2020-08-11 广东科杰通信息科技有限公司 Load balancing distributed crawler method and crawler system
CN107562541A (en) * 2017-09-05 2018-01-09 广东科杰通信息科技有限公司 A kind of distributed reptile method of load balancing, crawler system
CN109542595B (en) * 2017-09-21 2023-02-24 阿里巴巴集团控股有限公司 Data acquisition method, device and system
CN109542595A (en) * 2017-09-21 2019-03-29 阿里巴巴集团控股有限公司 A kind of collecting method, device and system
CN109656690A (en) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 Scheduling system, method and storage medium
WO2019079967A1 (en) * 2017-10-24 2019-05-02 麦格创科技(深圳)有限公司 Method for allocating task manager in distributed crawler system and system
CN107800789A (en) * 2017-10-24 2018-03-13 麦格创科技(深圳)有限公司 The distribution method and system of task manager in distributed reptile system
CN109698785A (en) * 2017-10-24 2019-04-30 广东亿迅科技有限公司 A kind of the real-time messages method for pushing and device of distribution high concurrent
CN107896175A (en) * 2017-11-30 2018-04-10 北京小度信息科技有限公司 Collecting method and device
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing
CN108769115A (en) * 2018-04-19 2018-11-06 中国科学院计算技术研究所 Distributed RSS data acquisition method and system
CN109298937A (en) * 2018-09-19 2019-02-01 中国联合网络通信集团有限公司 Document analysis method and the network equipment
CN109688106B (en) * 2018-11-19 2020-03-31 中国科学院信息工程研究所 Data collaborative acquisition method and system
CN109688106A (en) * 2018-11-19 2019-04-26 中国科学院信息工程研究所 A kind of data collaborative acquisition method and system
CN109587138A (en) * 2018-12-06 2019-04-05 中电工业互联网有限公司 A kind of fault-tolerant dynamic dispatching method of Internet of things system service node and server
CN109977161A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 The monitoring system of presto cluster
CN111290854A (en) * 2020-01-20 2020-06-16 腾讯科技(深圳)有限公司 Task management method, device and system, computer storage medium and electronic equipment
CN111290854B (en) * 2020-01-20 2024-03-15 腾讯云计算(北京)有限责任公司 Task management method, device, system, computer storage medium and electronic equipment
CN111865899A (en) * 2020-06-02 2020-10-30 中国科学院信息工程研究所 Threat-driven cooperative acquisition method and device
CN111865899B (en) * 2020-06-02 2021-07-13 中国科学院信息工程研究所 Threat-driven cooperative acquisition method and device
CN112306720B (en) * 2020-11-23 2022-06-21 迈普通信技术股份有限公司 Service system cluster management method
CN112306720A (en) * 2020-11-23 2021-02-02 迈普通信技术股份有限公司 Service system cluster management method
CN112584421A (en) * 2020-12-01 2021-03-30 深圳力维智联技术有限公司 FSU management method and device and computer readable storage medium
CN113722263A (en) * 2021-09-06 2021-11-30 浪潮通用软件有限公司 Cluster data acquisition method
CN113722263B (en) * 2021-09-06 2023-07-14 浪潮通用软件有限公司 Cluster data acquisition method

Similar Documents

Publication Publication Date Title
CN105447097A (en) Data acquisition method and system
CN108881495B (en) Resource allocation method, device, computer equipment and storage medium
EP3180695B1 (en) Systems and methods for auto-scaling a big data system
CN108039964B (en) Fault processing method, device and system based on network function virtualization
CN104486445A (en) Distributed extendable resource monitoring system and method based on cloud platform
CN105049268A (en) Distributed computing resource allocation system and task processing method
CN106789362A (en) A kind of device management method and network management system
US20160142262A1 (en) Monitoring a computing network
CN105915405A (en) Large-scale cluster node performance monitoring system
CN104767794B (en) Node electoral machinery and node in a kind of distributed system
CN112231108A (en) Task processing method and device, computer readable storage medium and server
CN110855481B (en) Data acquisition system and method
CN106375103B (en) Alarm data acquisition and transmission method
US20200272526A1 (en) Methods and systems for automated scaling of computing clusters
CN109818785B (en) Data processing method, server cluster and storage medium
US10785102B2 (en) Modifying distributed application based on cloud diagnostic data
CN106899659B (en) Distributed system and management method and management device thereof
CN106021026B (en) Backup method and device
CN112149975A (en) APM monitoring system and method based on artificial intelligence
CN104038364A (en) Distributed flow processing system fault tolerance method, nodes and system
US9973569B2 (en) System, method and computing apparatus to manage process in cloud infrastructure
CN107547622B (en) Resource adjusting method and device
CN105740054A (en) Virtual machine management method and device
CN109213769B (en) Data conflict identification method for data object
CN111614702A (en) Edge calculation method and edge calculation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160330