CN109857558A - A kind of data flow processing method and system - Google Patents

A kind of data flow processing method and system Download PDF

Info

Publication number
CN109857558A
CN109857558A CN201910048043.XA CN201910048043A CN109857558A CN 109857558 A CN109857558 A CN 109857558A CN 201910048043 A CN201910048043 A CN 201910048043A CN 109857558 A CN109857558 A CN 109857558A
Authority
CN
China
Prior art keywords
node
task
host
worker
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910048043.XA
Other languages
Chinese (zh)
Inventor
郭业俊
李�浩
王志强
孙迁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suningcom Group Co Ltd
Original Assignee
Suningcom Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suningcom Group Co Ltd filed Critical Suningcom Group Co Ltd
Priority to CN201910048043.XA priority Critical patent/CN109857558A/en
Publication of CN109857558A publication Critical patent/CN109857558A/en
Priority to CA3168286A priority patent/CA3168286A1/en
Priority to PCT/CN2019/106779 priority patent/WO2020147330A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of data flow processing method and systems, belong to big data processing field, and method includes: to determine that one in several Master nodes is used as host node by Zookeeper cluster;External interface is provided to receive the online request of business by host node, and is traffic assignments task;The current state information respectively reported by host node according to multiple Worker nodes generates the configuration information of task and is written in ZooKeeper cluster, and configuration information includes the scheduled information to execute the Worker node of task of instruction;If Worker node listens to existing in ZooKeeper cluster and being scheduled to oneself for task, starts Flume service and executed.The embodiment of the present invention can be realized the high availability of Master node and Worker node, promote the availability of Flume service, avoid the problem that resource uses uneven and waste;Further, it is possible to greatly simplify offline operation in business, influencing each other between reduction business.

Description

A kind of data flow processing method and system
Technical field
The present invention relates to big data processing field, in particular to a kind of data flow processing method and system.
Background technique
In the prior art, it will usually start Flume service in each node of cluster to realize data source source Data conversion storage into the end sink.
In the implementation of the present invention, inventor has found: since conventional fault (such as deadlock, consumption occur for node device It is abnormal etc.) when, system unaware needs artificial treatment, influences the timeliness of troubleshooting;In addition, each due to same cluster Node uses identical configuration, but the business datum amount of each node is irregular, and Flume is easy to cause to collect thread free time ratio It is bigger than normal;Further, since the online operation of new business is more frequent, and manual amendment's business configuration file is needed, when modification needs weight Entire cluster is opened, to influence the normal execution of other business in same cluster.
Summary of the invention
The present invention is directed to solve at least one of the technical problems existing in the prior art or related technologies, the present invention is mentioned thus For a kind of data flow processing method and system.
Specific technical solution provided in an embodiment of the present invention is as follows:
In a first aspect, providing a kind of data flow processing method, which comprises
Determine that one in several Master nodes is used as host node by Zookeeper cluster;
External interface is provided to receive the online request of business by the host node, and is the traffic assignments task;With And
According to the current state information that multiple Worker nodes respectively report, the configuration information of the task and write-in are generated In the ZooKeeper cluster, the configuration information includes the scheduled letter to execute the Worker node of the task of instruction Breath;
If the Worker node listens to existing in the ZooKeeper cluster and being scheduled to oneself for task, start Flume service is executed.
Further, described to determine that a conduct host node in several Master nodes includes: by Zookeeper cluster
The ZooKeeper cluster receives the host node election that the Master node is initiated based on default trigger event and asks It asks, and makes the Master node as host node after electing successfully, wherein the default trigger event is following event One of:
The Master node is activated;
Current Master nodes break down as host node.
Further, the current state information respectively reported according to multiple Worker nodes, generates the task Configuration information includes:
According to the operational state of mainframe information that the multiple Worker node respectively reports, the multiple Worker section is determined The optimal target Worker node of operational state of mainframe in point;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
Further, the method also includes:
The operational state of mainframe information and task respectively reported by the host node according to the multiple Worker node is held Row status information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task, And dilatation processing is carried out to the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers Worker node on executed.
Further, the method also includes:
The host node receives the offline request to the business by the external interface;And
It is written to by the offline information of the business and for the offline information of the task of the traffic assignments described In ZooKeeper cluster, so that the Worker node for executing the task stops Flume service.
Second aspect provides a kind of data flow processing system, and the system comprises Zookeeper clusters, several Master node and multiple Worker nodes, in which:
The Zookeeper cluster, for determining that one in several Master nodes is used as host node;
The host node receives the online request of business for providing external interface, and is the traffic assignments task;
The host node is also used to the current state information respectively reported according to multiple Worker nodes, generates described appoint The configuration information of business is simultaneously written in the ZooKeeper cluster, and the configuration information includes that instruction is scheduled to execute described appoint The information of the Worker node of business;
The Worker node, if for listening to existing in the ZooKeeper cluster and being scheduled to oneself for task, Starting Flume service is executed.
Further, the ZooKeeper cluster is specifically used for:
The host node election request that the Master node is initiated based on default trigger event is received, and after electing successfully So that the Master node is as host node, wherein the default trigger event is one of following event:
The Master node is activated;
Current Master nodes break down as host node.
Further, the host node is specifically used for:
According to the operational state of mainframe information that the multiple Worker node respectively reports, the multiple Worker section is determined The optimal target Worker node of operational state of mainframe in point;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
Further, the host node is specifically also used to:
The operational state of mainframe information and task respectively reported by the host node according to the multiple Worker node is held Row status information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task, And dilatation processing is carried out to the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers Worker node on executed.
Further, the host node is specifically also used to:
The offline request to the business is received by the external interface;And
It is written to by the offline information of the business and for the offline information of the task of the traffic assignments described In ZooKeeper cluster, so that the Worker node for executing the task stops Flume service.
Technical solution provided in an embodiment of the present invention has the benefit that
1, it is used as host node by determining one in several Master nodes by Zookeeper cluster, so that Master node realizes high availability mechanism by Zookeeper cluster, ensure that a wherein Master node goes wrong In the case of, another Master node rapid pipe connecting can externally service in the short time, promote the availability of Flume service, simultaneously Also solve the problems, such as that processing influences processing timeliness not in time when conventional fault occurs for node device in the prior art.
2, by providing external interface by host node, the external interface of host node can be called directly to task convenient for user Carry out offline, it can be achieved that the operating time offline in business was shortened in 1 minute, to greatly simplifie business or more Line operation, also, when updating configuration, it is not necessarily to manual amendment's business configuration file, without restarting cluster, it is only necessary to restart industry Business, thus reduces influencing each other between business.
3, raw by the operational state of mainframe information respectively reported by host node according to multiple Worker nodes by host node At task configuration information and be written in ZooKeeper cluster, if Worker node listen in ZooKeeper cluster exist adjust It spends to the task of oneself, then starts Flume service and executed, it is thus achieved that by ZooKeeper cluster to the system of configuration One management avoids Flume and collects thread free time ratio problem bigger than normal, is asked to solve resource using uneven and waste Topic, while also improving the convenience of O&M.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of flow chart for data flow processing method that the embodiment of the present invention one provides;
Fig. 2 is a kind of block diagram of data flow processing system provided by Embodiment 2 of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention Figure, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
In the description of the present application, it is to be understood that term " first ", " second " etc. are used for description purposes only, without It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present application, unless otherwise indicated, the meaning of " multiple " It is two or more.
Before introducing the embodiment of the present invention, several technical terms are simply introduced first:
Zookeeper: belonging to the sub-project of hadoop, it is that a reliable coordination for large-scale distributed system is System, the function of providing include: configuring maintenance, name Service, distributed synchronization, group service etc..
Flume: being the technology of a kind of High Availabitity, the acquisition of highly reliable and distributed massive logs, polymerization and transmission.
The embodiment of the invention provides a kind of data flow processing method, this method can be executed by data flow processing system, The system uses distributed client/server, including Zookeeper cluster, several Master nodes and multiple Worker nodes, In, each Master node and each Worker node need the first registrar node in Zookeeper cluster, so as to To be managed collectively by Zookeeper cluster to each Master node and each Worker;It is each Master node, each A Worker node, which can be respectively configured, corresponds to a hardware server, and the specific number of Master node and Worker node can To be determined by user according to own service application scenarios, it is not specifically limited herein.In embodiments of the present invention, each Flume service is respectively started in Worker node, and Flume service is used as data conversion storage tool, is responsible for data source source's Data conversion storage is into the end sink.
Fig. 1 is a kind of flow chart for data flow processing method that the embodiment of the present invention one provides, as shown in Figure 1, this method May include step:
101, determine that one in several Master nodes is used as host node by Zookeeper cluster.
Wherein, the number of Master node can be two, when one of Master node is determined as host node, Another Master node is determined as standby node.When Master node is determined as host node, the Master Node can provide external service, provide external interface it is online for business, it is offline, check, modify, and be responsible for task tune Degree.
It is asked specifically, ZooKeeper cluster receives the host node election that Master node is initiated based on default trigger event It asks, and makes Master node as host node after electing successfully.
During an illustrative realization, default trigger event can be activated for Master node.
Specifically, after executing start command, which can send out Master node into ZooKeeper cluster The election request for participating in host node (i.e. Leader node) is acted, if the determination of ZooKeeper cluster has existed as host node When Active Master node, then Master node election failure, ZooKeeper cluster is by the state of the Master node It is recorded as Standby state;If the determination of ZooKeeper cluster does not exist as the Active Master node of host node, The Master node is elected successfully, and the state recording of the Master node is Active state by ZooKeeper cluster.It is in The Master node of Active state externally provides service.
During another illustrative realization, default trigger event can be used as the current Master node of host node It breaks down.
Specifically, if current Master nodes break down as host node, deserved by the acquisition of ZooKeeper cluster The fault message of preceding Master node, and receive the choosing issued in other Master nodes in addition to the current Master node Request is lifted, determines a Master node as host node from other Master nodes within a preset time, wherein the election Process, which can be, elects the optimal Master node of host performance for host node, present invention implementation in other Master nodes Example is not limited this.
It should be noted that then the Master node will start after a Master node is elected as host node Multiple functional modules, multiple functional modules can specifically include RestServer, RPCServer and Scheduler, wherein The additions and deletions that RestServer is responsible for business, which change, looks into, and RPCServer is responsible for receiving the current state information that Worker node reports, Scheduler is responsible for dynamic dispatching distribution Task.
In addition, working directory is also stored in Zookeeper cluster, including but not limited to:
Leader, the transient node main for the choosing of Master node;
Master stores the Service URL of Active Master;
Jobs stores the directory node of Job data;
Workers, the transient node for the discovery of Worker node;
The list of Assign/<worker-id>/distribution task, i.e. one directory node of a Worker node.
In the embodiment of the present invention, main section is used as by determining one in several Master nodes by Zookeeper cluster Point ensure that a wherein Master node so that Master node realizes high availability mechanism by Zookeeper cluster In the case where going wrong, another Master node rapid pipe connecting can externally service in the short time, while also solve existing Processing influences the problem of handling timeliness not in time when conventional fault occurs for technology interior joint equipment.
102, external interface is provided to receive the online request of business by host node, and be traffic assignments task.
Specifically, providing external interface by the host node determined from several Master nodes, external interface can be used In the online request for receiving business (that is, Job), user can call external interface to carry out business by business application interface It is online, wherein the online request of the business carries the configuration information for the business being passed to using JSON format;The host node according to The configuration information of business initializes service parameter, and is allocated task for business, wherein task (that is, Task) is that Job exists Execution unit on Worker node is responsible for reading data from the end source, and dumps to the end sink.
In the embodiment of the present invention, by providing external interface by host node, host node can be called directly in order to user External interface carries out upper offline, it can be achieved that the operating time offline in business was shortened in 1 minute, to simplify to task Offline operation in business, also, when updating configuration, it is not necessarily to manual amendment's business configuration file, without cluster is restarted, is only needed Restart business, thus reduces influencing each other between business.
103, the operational state of mainframe information respectively reported by host node according to multiple Worker nodes, generates matching for task Confidence is ceased and is written in ZooKeeper cluster, and configuration information includes the scheduled letter to execute the Worker node of task of instruction Breath.
Wherein, operational state of mainframe information includes in CPU usage, memory usage, disk read-write and network up and down It is one or more.
Specifically, the process may include:
According to the operational state of mainframe information that multiple Worker nodes respectively report, host in multiple Worker nodes is determined The optimal target Worker node of operating status;Instruction is generated by task schedule to the configuration information of target Worker node.
In the present embodiment, each Worker node will start Report thread after being activated, and be responsible for the host to itself Operating status be monitored, generate operational state of mainframe information and be reported to Master node as host node.
If 104, Worker node listens to existing in ZooKeeper cluster and being scheduled to oneself for task, start Flume Service is executed.
Specifically, each Worker node monitors the state in ZooKeeper cluster respectively, if listening to ZooKeeper There is newly-increased task in cluster, and when the task is being scheduled to oneself of the task, then obtains the task from ZooKeeper cluster Configuration information, and start Flume service and executed, wherein each Worker node is deployed with Flume service respectively.
In the embodiment of the present invention, by generating the configuration information of task by host node and being written in ZooKeeper cluster, If Worker node listens to existing in ZooKeeper cluster and being scheduled to oneself for task, starts Flume service and held Row, it is thus achieved that it is inclined to avoid Flume collection thread free time ratio by unified management of the ZooKeeper cluster to configuration Big problem to solve the problems, such as resource using uneven and waste, while also improving the convenience of O&M.
Embodiment as a further preference, method provided in an embodiment of the present invention can also include:
The operational state of mainframe information and execution status of task respectively reported by host node according to multiple Worker nodes is believed Breath, is adjusted the configuration information of task.
Wherein, the configuration information instruction of task adjusted carries out capacity reducing processing to being in idle condition for task, and right Dilatation processing is carried out with the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers Worker node on executed.
Wherein, execution status of task information includes the speed of performing task and task accumulating amount.
Specifically, each Worker node can send the execution state information of the task in the machine and main machine status information To host node;
Host node determines the task of idle state and the task in stacking states according to execution status of task information, will The task (Idle Task) being in idle condition and the task (Busy Task) in stacking states are added separately to IdleTask queue and BusyTask queue, and automatic capacity reducing is carried out to IdleTask, and active expansion is carried out to BusyTask Hold;
Host node determines the Worker section of host overload according to the operational state of mainframe information of each Worker node Point and load on host computers are in idle condition, and load on host computers be in the task immigration on the Worker node of overload to leading It is executed on the Worker node that machine load is in idle condition.
In the embodiment of the present invention, by being believed by host node according to the operational state of mainframe that multiple Worker nodes respectively report Breath and execution status of task information, are adjusted the configuration information of task, it can be ensured that in the load on host computers of Worker node When higher, clustered machine load imbalance, the Task on the higher Worker node of load on host computers is moved into others Worker node is executed, it is thus achieved that Worker node and Task load balancing between cluster, and realize Worker The high availability of node improves the availability of Flume service, while also solving the problems, such as that resource uses uneven and waste.
Embodiment as a further preference, method provided in an embodiment of the present invention can also include:
Host node receives the offline request to business by external interface;And
It is written in ZooKeeper cluster by the offline information of business and for the offline information of the task of traffic assignments, with The Worker node of execution task is set to stop Flume service.
Specifically, host node provides external interface to receive the offline request of business (that is, Job), host node will be offline The status indication of business be it is offline, be then maintained by the offline information of business and for the offline information of the task of the traffic assignments In ZooKeeper cluster;The Worker node for executing the task monitors the offline information of task from ZooKeeper cluster When, then the stop order of Task can be executed, Flume service is cut off, if all tasks relevant to the business stop execution Afterwards, which completes offline.
Fig. 2 is a kind of block diagram of data flow processing system provided by Embodiment 2 of the present invention, which includes Zookeeper Cluster 10, several Master nodes 20 and multiple Worker nodes 30, as shown in Fig. 2, the number of Master node 20 can match It is set to two, including Master node 21 and Master node 22, it is another when one of Master node is as host node For a Master node as standby node, the number of Worker node 30 is configurable to three, including Worker node 31, Worker node 32 and Worker node 33, in which:
Zookeeper cluster, for determining that one in several Master nodes is used as host node;
Host node receives the online request of business for providing external interface, and is traffic assignments task;
Host node is also used to the current state information respectively reported according to multiple Worker nodes, generates the configuration of task Information is simultaneously written in ZooKeeper cluster, and configuration information includes the scheduled information to execute the Worker node of task of instruction;
Worker node, if starting for listening to existing in ZooKeeper cluster and being scheduled to oneself for task Flume service is executed.
Further, ZooKeeper cluster is specifically used for:
The host node election request that Master node is initiated based on default trigger event is received, and is made after electing successfully Master node is as host node, wherein default trigger event is one of following event:
Master node is activated;
Current Master nodes break down as host node.
Further, host node is specifically used for:
According to the operational state of mainframe information that multiple Worker nodes respectively report, host in multiple Worker nodes is determined The optimal target Worker node of operating status;
Instruction is generated by task schedule to the configuration information of target Worker node.
Further, host node is specifically also used to:
The operational state of mainframe information and execution status of task respectively reported by host node according to multiple Worker nodes is believed Breath, is adjusted the configuration information of task;
Wherein, the configuration information instruction of task adjusted carries out capacity reducing processing to being in idle condition for task, and right Dilatation processing is carried out with the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers Worker node on executed.
Further, host node is specifically also used to:
The offline request to business is received by external interface;And
It is written in ZooKeeper cluster by the offline information of business and for the offline information of the task of traffic assignments, with The Worker node of execution task is set to stop Flume service.
It should be understood that in data flow processing system provided by the above embodiment, only with stroke of above-mentioned each functional module Divide and be illustrated, in practical application, can according to need and be completed by different functional modules above-mentioned function distribution, i.e., The internal structure of system is divided into different functional modules, to complete all or part of the functions described above.On in addition, It states data flow processing system and data flow processing method embodiment belongs to same design, implement process and beneficial effect is detailed See embodiment of the method, which is not described herein again.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, associated hardware can also be instructed to complete by program, the program can store can in a kind of computer It reads in storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of data flow processing method, which is characterized in that the described method includes:
Determine that one in several Master nodes is used as host node by Zookeeper cluster;
External interface is provided to receive the online request of business by the host node, and is the traffic assignments task;And
According to the operational state of mainframe information that multiple Worker nodes respectively report, the configuration information of the task and write-in are generated In the ZooKeeper cluster, the configuration information includes the scheduled letter to execute the Worker node of the task of instruction Breath;
If the Worker node listens to existing in the ZooKeeper cluster and being scheduled to oneself for task, start Flume Service is executed.
2. the method according to claim 1, wherein described determine that several Master are saved by Zookeeper cluster One in point as host node includes:
The ZooKeeper cluster receives the host node election request that the Master node is initiated based on default trigger event, And make the Master node as host node after electing successfully, wherein the default trigger event be following event it One:
The Master node is activated;
Current Master nodes break down as host node.
3. the method according to claim 1, wherein the host respectively reported according to multiple Worker nodes Running state information, the configuration information for generating the task include:
According to the operational state of mainframe information that the multiple Worker node respectively reports, determine in the multiple Worker node The optimal target Worker node of operational state of mainframe;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
4. the method according to claim 1, wherein the method also includes:
The operational state of mainframe information and task execution shape respectively reported by the host node according to the multiple Worker node State information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task, and right Dilatation processing is carried out with the task in stacking states;And
What task immigration load on host computers being on the Worker node of overload was in idle condition to load on host computers It is executed on Worker node.
5. method according to any one of claims 1 to 4, which is characterized in that the method also includes:
The host node receives the offline request to the business by the external interface;And
The ZooKeeper collection is written to by the offline information of the business and for the offline information of the task of the traffic assignments In group, so that the Worker node for executing the task stops Flume service.
6. a kind of data flow processing system, which is characterized in that the system comprises Zookeeper clusters, several Master nodes With multiple Worker nodes, in which:
The Zookeeper cluster, for determining that one in several Master nodes is used as host node;
The host node receives the online request of business for providing external interface, and is the traffic assignments task;
The host node is also used to the current state information respectively reported according to multiple Worker nodes, generates the task Configuration information is simultaneously written in the ZooKeeper cluster, and the configuration information includes that instruction is scheduled to execute the task The information of Worker node;
The Worker node, if starting for listening to existing in the ZooKeeper cluster and being scheduled to oneself for task Flume service is executed.
7. system according to claim 6, which is characterized in that the ZooKeeper cluster is specifically used for:
The host node election request that the Master node is initiated based on default trigger event is received, and is made after electing successfully The Master node is as host node, wherein the default trigger event is one of following event:
The Master node is activated;
Current Master nodes break down as host node.
8. system according to claim 6, which is characterized in that the host node is specifically used for:
According to the operational state of mainframe information that the multiple Worker node respectively reports, determine in the multiple Worker node The optimal target Worker node of operational state of mainframe;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
9. system according to claim 6, which is characterized in that the host node is specifically also used to:
The operational state of mainframe information and task execution shape respectively reported by the host node according to the multiple Worker node State information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task, and right Dilatation processing is carried out with the task in stacking states;And
What task immigration load on host computers being on the Worker node of overload was in idle condition to load on host computers It is executed on Worker node.
10. according to the described in any item systems of claim 6 to 9, which is characterized in that the host node is specifically also used to:
The offline request to the business is received by the external interface;And
The ZooKeeper collection is written to by the offline information of the business and for the offline information of the task of the traffic assignments In group, so that the Worker node for executing the task stops Flume service.
CN201910048043.XA 2019-01-18 2019-01-18 A kind of data flow processing method and system Pending CN109857558A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910048043.XA CN109857558A (en) 2019-01-18 2019-01-18 A kind of data flow processing method and system
CA3168286A CA3168286A1 (en) 2019-01-18 2019-09-19 Data flow processing method and system
PCT/CN2019/106779 WO2020147330A1 (en) 2019-01-18 2019-09-19 Data stream processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910048043.XA CN109857558A (en) 2019-01-18 2019-01-18 A kind of data flow processing method and system

Publications (1)

Publication Number Publication Date
CN109857558A true CN109857558A (en) 2019-06-07

Family

ID=66895175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910048043.XA Pending CN109857558A (en) 2019-01-18 2019-01-18 A kind of data flow processing method and system

Country Status (3)

Country Link
CN (1) CN109857558A (en)
CA (1) CA3168286A1 (en)
WO (1) WO2020147330A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262882A (en) * 2019-06-17 2019-09-20 北京思特奇信息技术股份有限公司 A kind of distributed communication command scheduling system and method
CN111078396A (en) * 2019-11-22 2020-04-28 厦门安胜网络科技有限公司 Distributed data access method and system based on multitask instances
WO2020147330A1 (en) * 2019-01-18 2020-07-23 苏宁云计算有限公司 Data stream processing method and system
CN111447097A (en) * 2020-04-20 2020-07-24 国网甘肃省电力公司信息通信公司 Cloud platform resource scheduling management method and system
CN113010307A (en) * 2021-02-25 2021-06-22 成都库珀区块链科技有限公司 Multi-chain blockchain browser system and using method thereof
CN113204418A (en) * 2021-05-19 2021-08-03 中国建设银行股份有限公司 Task scheduling method and device, electronic equipment and storage medium
CN113254010A (en) * 2021-07-09 2021-08-13 广州光点信息科技有限公司 Visual DAG workflow task scheduling system and operation method thereof
CN113364864A (en) * 2021-06-03 2021-09-07 上海微盟企业发展有限公司 Server data synchronization method, system and storage medium
CN114124959A (en) * 2021-12-06 2022-03-01 天地伟业技术有限公司 Data processing device of cloud streaming media service and cloud streaming media cluster
CN114697328A (en) * 2022-03-25 2022-07-01 浪潮云信息技术股份公司 Method and system for realizing NiFi high-availability cluster mode
CN114884948A (en) * 2022-05-05 2022-08-09 零氪科技(北京)有限公司 Data processing system
CN115002122A (en) * 2022-05-09 2022-09-02 中盈优创资讯科技有限公司 Cluster management method and device for data acquisition

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052095B (en) * 2020-09-11 2024-04-19 成都锋卫科技有限公司 Distributed high-availability big data mining task scheduling system
CN112416550B (en) * 2020-11-19 2024-04-05 广州探途网络技术有限公司 Communication method of crawler scheduling management platform and crawler scheduling management platform system
CN113342508B (en) * 2021-07-07 2024-08-23 湖南快乐阳光互动娱乐传媒有限公司 Task scheduling method and device
CN113934782A (en) * 2021-09-22 2022-01-14 易联众智鼎(厦门)科技有限公司 DAG model-based data ETL system and using method
CN114844799A (en) * 2022-05-27 2022-08-02 深信服科技股份有限公司 Cluster management method and device, host equipment and readable storage medium
CN117076257B (en) * 2023-09-14 2024-03-05 研华科技(中国)有限公司 Management method, management server and management system based on server cluster

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521044A (en) * 2011-12-30 2012-06-27 北京拓明科技有限公司 Distributed task scheduling method and system based on messaging middleware
CN103595651A (en) * 2013-10-15 2014-02-19 北京航空航天大学 Distributed data stream processing method and system
CN106375342A (en) * 2016-10-21 2017-02-01 用友网络科技股份有限公司 Zookeeper-technology-based system cluster method and system
KR101858565B1 (en) * 2016-02-19 2018-05-16 서영준 Independent parallel processing method for massive data in distributed platform and system of thereof
CN108241534A (en) * 2016-12-27 2018-07-03 阿里巴巴集团控股有限公司 A kind of task processing, distribution, management, the method calculated and device
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195962A1 (en) * 2002-04-10 2003-10-16 Satoshi Kikuchi Load balancing of servers
CN105939389A (en) * 2016-06-29 2016-09-14 乐视控股(北京)有限公司 Load balancing method and device
CN108228393A (en) * 2017-12-14 2018-06-29 浙江航天恒嘉数据科技有限公司 A kind of implementation method of expansible big data High Availabitity
CN109857558A (en) * 2019-01-18 2019-06-07 苏宁易购集团股份有限公司 A kind of data flow processing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521044A (en) * 2011-12-30 2012-06-27 北京拓明科技有限公司 Distributed task scheduling method and system based on messaging middleware
CN103595651A (en) * 2013-10-15 2014-02-19 北京航空航天大学 Distributed data stream processing method and system
KR101858565B1 (en) * 2016-02-19 2018-05-16 서영준 Independent parallel processing method for massive data in distributed platform and system of thereof
CN106375342A (en) * 2016-10-21 2017-02-01 用友网络科技股份有限公司 Zookeeper-technology-based system cluster method and system
CN108241534A (en) * 2016-12-27 2018-07-03 阿里巴巴集团控股有限公司 A kind of task processing, distribution, management, the method calculated and device
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020147330A1 (en) * 2019-01-18 2020-07-23 苏宁云计算有限公司 Data stream processing method and system
CN110262882A (en) * 2019-06-17 2019-09-20 北京思特奇信息技术股份有限公司 A kind of distributed communication command scheduling system and method
CN111078396B (en) * 2019-11-22 2023-12-19 厦门安胜网络科技有限公司 Distributed data access method and system based on multitasking examples
CN111078396A (en) * 2019-11-22 2020-04-28 厦门安胜网络科技有限公司 Distributed data access method and system based on multitask instances
CN111447097A (en) * 2020-04-20 2020-07-24 国网甘肃省电力公司信息通信公司 Cloud platform resource scheduling management method and system
CN113010307A (en) * 2021-02-25 2021-06-22 成都库珀区块链科技有限公司 Multi-chain blockchain browser system and using method thereof
CN113010307B (en) * 2021-02-25 2024-04-05 库珀科技集团有限公司 Multi-chain blockchain browser system and application method thereof
CN113204418A (en) * 2021-05-19 2021-08-03 中国建设银行股份有限公司 Task scheduling method and device, electronic equipment and storage medium
CN113364864A (en) * 2021-06-03 2021-09-07 上海微盟企业发展有限公司 Server data synchronization method, system and storage medium
CN113254010A (en) * 2021-07-09 2021-08-13 广州光点信息科技有限公司 Visual DAG workflow task scheduling system and operation method thereof
CN114124959A (en) * 2021-12-06 2022-03-01 天地伟业技术有限公司 Data processing device of cloud streaming media service and cloud streaming media cluster
CN114697328A (en) * 2022-03-25 2022-07-01 浪潮云信息技术股份公司 Method and system for realizing NiFi high-availability cluster mode
CN114884948A (en) * 2022-05-05 2022-08-09 零氪科技(北京)有限公司 Data processing system
CN115002122A (en) * 2022-05-09 2022-09-02 中盈优创资讯科技有限公司 Cluster management method and device for data acquisition

Also Published As

Publication number Publication date
CA3168286A1 (en) 2020-07-23
WO2020147330A1 (en) 2020-07-23

Similar Documents

Publication Publication Date Title
CN109857558A (en) A kind of data flow processing method and system
US7340654B2 (en) Autonomic monitoring in a grid environment
CN109343939B (en) Distributed cluster and parallel computing task scheduling method
CN111124806B (en) Method and system for monitoring equipment state in real time based on distributed scheduling task
CN102081554A (en) Cloud computing operating system as well as kernel control system and method thereof
CN111160873B (en) Running batch processing device and method based on distributed architecture
CN111209110B (en) Task scheduling management method, system and storage medium for realizing load balancing
CN111459639B (en) Distributed task management platform and method supporting global multi-machine room deployment
CN106528853A (en) Data interaction management device and cross-database data interaction processing device and method
CN116777182B (en) Task dispatch method for semiconductor wafer manufacturing
WO2023115931A1 (en) Big-data component parameter adjustment method and apparatus, and electronic device and storage medium
CN111414241A (en) Batch data processing method, device and system, computer equipment and computer readable storage medium
CN112437129A (en) Cluster management method and cluster management device
CN103164262A (en) Task management method and device
CN110209497A (en) Method and system for dynamically expanding and shrinking host resources
CN111200518B (en) Decentralized HPC computing cluster management method and system based on paxos algorithm
CN112148462B (en) Jenkins-based CICD process processing method
CN114218329A (en) Data synchronization method, device, storage medium and computer terminal
CN113806080A (en) Operation memory management method and system based on slurm system
CN113515356A (en) Lightweight distributed resource management and task scheduler and method
CN113010307B (en) Multi-chain blockchain browser system and application method thereof
CN113032110A (en) High-availability task scheduling method based on distributed peer-to-peer architecture design
CN115550371B (en) Pod scheduling method and system based on Kubernetes and cloud platform
CN118642845A (en) Cluster management system, task scheduling method, medium and device
CN115858245A (en) Data backup job scheduling system and backup job scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607