CN109857558A - A kind of data flow processing method and system - Google Patents
A kind of data flow processing method and system Download PDFInfo
- Publication number
- CN109857558A CN109857558A CN201910048043.XA CN201910048043A CN109857558A CN 109857558 A CN109857558 A CN 109857558A CN 201910048043 A CN201910048043 A CN 201910048043A CN 109857558 A CN109857558 A CN 109857558A
- Authority
- CN
- China
- Prior art keywords
- node
- task
- host
- worker
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000005111 flow chemistry technique Methods 0.000 title claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 17
- 239000002699 waste material Substances 0.000 abstract description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Hardware Redundancy (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of data flow processing method and systems, belong to big data processing field, and method includes: to determine that one in several Master nodes is used as host node by Zookeeper cluster;External interface is provided to receive the online request of business by host node, and is traffic assignments task;The current state information respectively reported by host node according to multiple Worker nodes generates the configuration information of task and is written in ZooKeeper cluster, and configuration information includes the scheduled information to execute the Worker node of task of instruction;If Worker node listens to existing in ZooKeeper cluster and being scheduled to oneself for task, starts Flume service and executed.The embodiment of the present invention can be realized the high availability of Master node and Worker node, promote the availability of Flume service, avoid the problem that resource uses uneven and waste;Further, it is possible to greatly simplify offline operation in business, influencing each other between reduction business.
Description
Technical field
The present invention relates to big data processing field, in particular to a kind of data flow processing method and system.
Background technique
In the prior art, it will usually start Flume service in each node of cluster to realize data source source
Data conversion storage into the end sink.
In the implementation of the present invention, inventor has found: since conventional fault (such as deadlock, consumption occur for node device
It is abnormal etc.) when, system unaware needs artificial treatment, influences the timeliness of troubleshooting;In addition, each due to same cluster
Node uses identical configuration, but the business datum amount of each node is irregular, and Flume is easy to cause to collect thread free time ratio
It is bigger than normal;Further, since the online operation of new business is more frequent, and manual amendment's business configuration file is needed, when modification needs weight
Entire cluster is opened, to influence the normal execution of other business in same cluster.
Summary of the invention
The present invention is directed to solve at least one of the technical problems existing in the prior art or related technologies, the present invention is mentioned thus
For a kind of data flow processing method and system.
Specific technical solution provided in an embodiment of the present invention is as follows:
In a first aspect, providing a kind of data flow processing method, which comprises
Determine that one in several Master nodes is used as host node by Zookeeper cluster;
External interface is provided to receive the online request of business by the host node, and is the traffic assignments task;With
And
According to the current state information that multiple Worker nodes respectively report, the configuration information of the task and write-in are generated
In the ZooKeeper cluster, the configuration information includes the scheduled letter to execute the Worker node of the task of instruction
Breath;
If the Worker node listens to existing in the ZooKeeper cluster and being scheduled to oneself for task, start
Flume service is executed.
Further, described to determine that a conduct host node in several Master nodes includes: by Zookeeper cluster
The ZooKeeper cluster receives the host node election that the Master node is initiated based on default trigger event and asks
It asks, and makes the Master node as host node after electing successfully, wherein the default trigger event is following event
One of:
The Master node is activated;
Current Master nodes break down as host node.
Further, the current state information respectively reported according to multiple Worker nodes, generates the task
Configuration information includes:
According to the operational state of mainframe information that the multiple Worker node respectively reports, the multiple Worker section is determined
The optimal target Worker node of operational state of mainframe in point;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
Further, the method also includes:
The operational state of mainframe information and task respectively reported by the host node according to the multiple Worker node is held
Row status information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task,
And dilatation processing is carried out to the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers
Worker node on executed.
Further, the method also includes:
The host node receives the offline request to the business by the external interface;And
It is written to by the offline information of the business and for the offline information of the task of the traffic assignments described
In ZooKeeper cluster, so that the Worker node for executing the task stops Flume service.
Second aspect provides a kind of data flow processing system, and the system comprises Zookeeper clusters, several
Master node and multiple Worker nodes, in which:
The Zookeeper cluster, for determining that one in several Master nodes is used as host node;
The host node receives the online request of business for providing external interface, and is the traffic assignments task;
The host node is also used to the current state information respectively reported according to multiple Worker nodes, generates described appoint
The configuration information of business is simultaneously written in the ZooKeeper cluster, and the configuration information includes that instruction is scheduled to execute described appoint
The information of the Worker node of business;
The Worker node, if for listening to existing in the ZooKeeper cluster and being scheduled to oneself for task,
Starting Flume service is executed.
Further, the ZooKeeper cluster is specifically used for:
The host node election request that the Master node is initiated based on default trigger event is received, and after electing successfully
So that the Master node is as host node, wherein the default trigger event is one of following event:
The Master node is activated;
Current Master nodes break down as host node.
Further, the host node is specifically used for:
According to the operational state of mainframe information that the multiple Worker node respectively reports, the multiple Worker section is determined
The optimal target Worker node of operational state of mainframe in point;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
Further, the host node is specifically also used to:
The operational state of mainframe information and task respectively reported by the host node according to the multiple Worker node is held
Row status information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task,
And dilatation processing is carried out to the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers
Worker node on executed.
Further, the host node is specifically also used to:
The offline request to the business is received by the external interface;And
It is written to by the offline information of the business and for the offline information of the task of the traffic assignments described
In ZooKeeper cluster, so that the Worker node for executing the task stops Flume service.
Technical solution provided in an embodiment of the present invention has the benefit that
1, it is used as host node by determining one in several Master nodes by Zookeeper cluster, so that
Master node realizes high availability mechanism by Zookeeper cluster, ensure that a wherein Master node goes wrong
In the case of, another Master node rapid pipe connecting can externally service in the short time, promote the availability of Flume service, simultaneously
Also solve the problems, such as that processing influences processing timeliness not in time when conventional fault occurs for node device in the prior art.
2, by providing external interface by host node, the external interface of host node can be called directly to task convenient for user
Carry out offline, it can be achieved that the operating time offline in business was shortened in 1 minute, to greatly simplifie business or more
Line operation, also, when updating configuration, it is not necessarily to manual amendment's business configuration file, without restarting cluster, it is only necessary to restart industry
Business, thus reduces influencing each other between business.
3, raw by the operational state of mainframe information respectively reported by host node according to multiple Worker nodes by host node
At task configuration information and be written in ZooKeeper cluster, if Worker node listen in ZooKeeper cluster exist adjust
It spends to the task of oneself, then starts Flume service and executed, it is thus achieved that by ZooKeeper cluster to the system of configuration
One management avoids Flume and collects thread free time ratio problem bigger than normal, is asked to solve resource using uneven and waste
Topic, while also improving the convenience of O&M.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow chart for data flow processing method that the embodiment of the present invention one provides;
Fig. 2 is a kind of block diagram of data flow processing system provided by Embodiment 2 of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention
Figure, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
In the description of the present application, it is to be understood that term " first ", " second " etc. are used for description purposes only, without
It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present application, unless otherwise indicated, the meaning of " multiple "
It is two or more.
Before introducing the embodiment of the present invention, several technical terms are simply introduced first:
Zookeeper: belonging to the sub-project of hadoop, it is that a reliable coordination for large-scale distributed system is
System, the function of providing include: configuring maintenance, name Service, distributed synchronization, group service etc..
Flume: being the technology of a kind of High Availabitity, the acquisition of highly reliable and distributed massive logs, polymerization and transmission.
The embodiment of the invention provides a kind of data flow processing method, this method can be executed by data flow processing system,
The system uses distributed client/server, including Zookeeper cluster, several Master nodes and multiple Worker nodes,
In, each Master node and each Worker node need the first registrar node in Zookeeper cluster, so as to
To be managed collectively by Zookeeper cluster to each Master node and each Worker;It is each Master node, each
A Worker node, which can be respectively configured, corresponds to a hardware server, and the specific number of Master node and Worker node can
To be determined by user according to own service application scenarios, it is not specifically limited herein.In embodiments of the present invention, each
Flume service is respectively started in Worker node, and Flume service is used as data conversion storage tool, is responsible for data source source's
Data conversion storage is into the end sink.
Fig. 1 is a kind of flow chart for data flow processing method that the embodiment of the present invention one provides, as shown in Figure 1, this method
May include step:
101, determine that one in several Master nodes is used as host node by Zookeeper cluster.
Wherein, the number of Master node can be two, when one of Master node is determined as host node,
Another Master node is determined as standby node.When Master node is determined as host node, the Master
Node can provide external service, provide external interface it is online for business, it is offline, check, modify, and be responsible for task tune
Degree.
It is asked specifically, ZooKeeper cluster receives the host node election that Master node is initiated based on default trigger event
It asks, and makes Master node as host node after electing successfully.
During an illustrative realization, default trigger event can be activated for Master node.
Specifically, after executing start command, which can send out Master node into ZooKeeper cluster
The election request for participating in host node (i.e. Leader node) is acted, if the determination of ZooKeeper cluster has existed as host node
When Active Master node, then Master node election failure, ZooKeeper cluster is by the state of the Master node
It is recorded as Standby state;If the determination of ZooKeeper cluster does not exist as the Active Master node of host node,
The Master node is elected successfully, and the state recording of the Master node is Active state by ZooKeeper cluster.It is in
The Master node of Active state externally provides service.
During another illustrative realization, default trigger event can be used as the current Master node of host node
It breaks down.
Specifically, if current Master nodes break down as host node, deserved by the acquisition of ZooKeeper cluster
The fault message of preceding Master node, and receive the choosing issued in other Master nodes in addition to the current Master node
Request is lifted, determines a Master node as host node from other Master nodes within a preset time, wherein the election
Process, which can be, elects the optimal Master node of host performance for host node, present invention implementation in other Master nodes
Example is not limited this.
It should be noted that then the Master node will start after a Master node is elected as host node
Multiple functional modules, multiple functional modules can specifically include RestServer, RPCServer and Scheduler, wherein
The additions and deletions that RestServer is responsible for business, which change, looks into, and RPCServer is responsible for receiving the current state information that Worker node reports,
Scheduler is responsible for dynamic dispatching distribution Task.
In addition, working directory is also stored in Zookeeper cluster, including but not limited to:
Leader, the transient node main for the choosing of Master node;
Master stores the Service URL of Active Master;
Jobs stores the directory node of Job data;
Workers, the transient node for the discovery of Worker node;
The list of Assign/<worker-id>/distribution task, i.e. one directory node of a Worker node.
In the embodiment of the present invention, main section is used as by determining one in several Master nodes by Zookeeper cluster
Point ensure that a wherein Master node so that Master node realizes high availability mechanism by Zookeeper cluster
In the case where going wrong, another Master node rapid pipe connecting can externally service in the short time, while also solve existing
Processing influences the problem of handling timeliness not in time when conventional fault occurs for technology interior joint equipment.
102, external interface is provided to receive the online request of business by host node, and be traffic assignments task.
Specifically, providing external interface by the host node determined from several Master nodes, external interface can be used
In the online request for receiving business (that is, Job), user can call external interface to carry out business by business application interface
It is online, wherein the online request of the business carries the configuration information for the business being passed to using JSON format;The host node according to
The configuration information of business initializes service parameter, and is allocated task for business, wherein task (that is, Task) is that Job exists
Execution unit on Worker node is responsible for reading data from the end source, and dumps to the end sink.
In the embodiment of the present invention, by providing external interface by host node, host node can be called directly in order to user
External interface carries out upper offline, it can be achieved that the operating time offline in business was shortened in 1 minute, to simplify to task
Offline operation in business, also, when updating configuration, it is not necessarily to manual amendment's business configuration file, without cluster is restarted, is only needed
Restart business, thus reduces influencing each other between business.
103, the operational state of mainframe information respectively reported by host node according to multiple Worker nodes, generates matching for task
Confidence is ceased and is written in ZooKeeper cluster, and configuration information includes the scheduled letter to execute the Worker node of task of instruction
Breath.
Wherein, operational state of mainframe information includes in CPU usage, memory usage, disk read-write and network up and down
It is one or more.
Specifically, the process may include:
According to the operational state of mainframe information that multiple Worker nodes respectively report, host in multiple Worker nodes is determined
The optimal target Worker node of operating status;Instruction is generated by task schedule to the configuration information of target Worker node.
In the present embodiment, each Worker node will start Report thread after being activated, and be responsible for the host to itself
Operating status be monitored, generate operational state of mainframe information and be reported to Master node as host node.
If 104, Worker node listens to existing in ZooKeeper cluster and being scheduled to oneself for task, start Flume
Service is executed.
Specifically, each Worker node monitors the state in ZooKeeper cluster respectively, if listening to ZooKeeper
There is newly-increased task in cluster, and when the task is being scheduled to oneself of the task, then obtains the task from ZooKeeper cluster
Configuration information, and start Flume service and executed, wherein each Worker node is deployed with Flume service respectively.
In the embodiment of the present invention, by generating the configuration information of task by host node and being written in ZooKeeper cluster,
If Worker node listens to existing in ZooKeeper cluster and being scheduled to oneself for task, starts Flume service and held
Row, it is thus achieved that it is inclined to avoid Flume collection thread free time ratio by unified management of the ZooKeeper cluster to configuration
Big problem to solve the problems, such as resource using uneven and waste, while also improving the convenience of O&M.
Embodiment as a further preference, method provided in an embodiment of the present invention can also include:
The operational state of mainframe information and execution status of task respectively reported by host node according to multiple Worker nodes is believed
Breath, is adjusted the configuration information of task.
Wherein, the configuration information instruction of task adjusted carries out capacity reducing processing to being in idle condition for task, and right
Dilatation processing is carried out with the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers
Worker node on executed.
Wherein, execution status of task information includes the speed of performing task and task accumulating amount.
Specifically, each Worker node can send the execution state information of the task in the machine and main machine status information
To host node;
Host node determines the task of idle state and the task in stacking states according to execution status of task information, will
The task (Idle Task) being in idle condition and the task (Busy Task) in stacking states are added separately to
IdleTask queue and BusyTask queue, and automatic capacity reducing is carried out to IdleTask, and active expansion is carried out to BusyTask
Hold;
Host node determines the Worker section of host overload according to the operational state of mainframe information of each Worker node
Point and load on host computers are in idle condition, and load on host computers be in the task immigration on the Worker node of overload to leading
It is executed on the Worker node that machine load is in idle condition.
In the embodiment of the present invention, by being believed by host node according to the operational state of mainframe that multiple Worker nodes respectively report
Breath and execution status of task information, are adjusted the configuration information of task, it can be ensured that in the load on host computers of Worker node
When higher, clustered machine load imbalance, the Task on the higher Worker node of load on host computers is moved into others
Worker node is executed, it is thus achieved that Worker node and Task load balancing between cluster, and realize Worker
The high availability of node improves the availability of Flume service, while also solving the problems, such as that resource uses uneven and waste.
Embodiment as a further preference, method provided in an embodiment of the present invention can also include:
Host node receives the offline request to business by external interface;And
It is written in ZooKeeper cluster by the offline information of business and for the offline information of the task of traffic assignments, with
The Worker node of execution task is set to stop Flume service.
Specifically, host node provides external interface to receive the offline request of business (that is, Job), host node will be offline
The status indication of business be it is offline, be then maintained by the offline information of business and for the offline information of the task of the traffic assignments
In ZooKeeper cluster;The Worker node for executing the task monitors the offline information of task from ZooKeeper cluster
When, then the stop order of Task can be executed, Flume service is cut off, if all tasks relevant to the business stop execution
Afterwards, which completes offline.
Fig. 2 is a kind of block diagram of data flow processing system provided by Embodiment 2 of the present invention, which includes Zookeeper
Cluster 10, several Master nodes 20 and multiple Worker nodes 30, as shown in Fig. 2, the number of Master node 20 can match
It is set to two, including Master node 21 and Master node 22, it is another when one of Master node is as host node
For a Master node as standby node, the number of Worker node 30 is configurable to three, including Worker node 31,
Worker node 32 and Worker node 33, in which:
Zookeeper cluster, for determining that one in several Master nodes is used as host node;
Host node receives the online request of business for providing external interface, and is traffic assignments task;
Host node is also used to the current state information respectively reported according to multiple Worker nodes, generates the configuration of task
Information is simultaneously written in ZooKeeper cluster, and configuration information includes the scheduled information to execute the Worker node of task of instruction;
Worker node, if starting for listening to existing in ZooKeeper cluster and being scheduled to oneself for task
Flume service is executed.
Further, ZooKeeper cluster is specifically used for:
The host node election request that Master node is initiated based on default trigger event is received, and is made after electing successfully
Master node is as host node, wherein default trigger event is one of following event:
Master node is activated;
Current Master nodes break down as host node.
Further, host node is specifically used for:
According to the operational state of mainframe information that multiple Worker nodes respectively report, host in multiple Worker nodes is determined
The optimal target Worker node of operating status;
Instruction is generated by task schedule to the configuration information of target Worker node.
Further, host node is specifically also used to:
The operational state of mainframe information and execution status of task respectively reported by host node according to multiple Worker nodes is believed
Breath, is adjusted the configuration information of task;
Wherein, the configuration information instruction of task adjusted carries out capacity reducing processing to being in idle condition for task, and right
Dilatation processing is carried out with the task in stacking states;And
Task immigration load on host computers being on the Worker node of overload is in idle condition to load on host computers
Worker node on executed.
Further, host node is specifically also used to:
The offline request to business is received by external interface;And
It is written in ZooKeeper cluster by the offline information of business and for the offline information of the task of traffic assignments, with
The Worker node of execution task is set to stop Flume service.
It should be understood that in data flow processing system provided by the above embodiment, only with stroke of above-mentioned each functional module
Divide and be illustrated, in practical application, can according to need and be completed by different functional modules above-mentioned function distribution, i.e.,
The internal structure of system is divided into different functional modules, to complete all or part of the functions described above.On in addition,
It states data flow processing system and data flow processing method embodiment belongs to same design, implement process and beneficial effect is detailed
See embodiment of the method, which is not described herein again.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, associated hardware can also be instructed to complete by program, the program can store can in a kind of computer
It reads in storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of data flow processing method, which is characterized in that the described method includes:
Determine that one in several Master nodes is used as host node by Zookeeper cluster;
External interface is provided to receive the online request of business by the host node, and is the traffic assignments task;And
According to the operational state of mainframe information that multiple Worker nodes respectively report, the configuration information of the task and write-in are generated
In the ZooKeeper cluster, the configuration information includes the scheduled letter to execute the Worker node of the task of instruction
Breath;
If the Worker node listens to existing in the ZooKeeper cluster and being scheduled to oneself for task, start Flume
Service is executed.
2. the method according to claim 1, wherein described determine that several Master are saved by Zookeeper cluster
One in point as host node includes:
The ZooKeeper cluster receives the host node election request that the Master node is initiated based on default trigger event,
And make the Master node as host node after electing successfully, wherein the default trigger event be following event it
One:
The Master node is activated;
Current Master nodes break down as host node.
3. the method according to claim 1, wherein the host respectively reported according to multiple Worker nodes
Running state information, the configuration information for generating the task include:
According to the operational state of mainframe information that the multiple Worker node respectively reports, determine in the multiple Worker node
The optimal target Worker node of operational state of mainframe;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
4. the method according to claim 1, wherein the method also includes:
The operational state of mainframe information and task execution shape respectively reported by the host node according to the multiple Worker node
State information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task, and right
Dilatation processing is carried out with the task in stacking states;And
What task immigration load on host computers being on the Worker node of overload was in idle condition to load on host computers
It is executed on Worker node.
5. method according to any one of claims 1 to 4, which is characterized in that the method also includes:
The host node receives the offline request to the business by the external interface;And
The ZooKeeper collection is written to by the offline information of the business and for the offline information of the task of the traffic assignments
In group, so that the Worker node for executing the task stops Flume service.
6. a kind of data flow processing system, which is characterized in that the system comprises Zookeeper clusters, several Master nodes
With multiple Worker nodes, in which:
The Zookeeper cluster, for determining that one in several Master nodes is used as host node;
The host node receives the online request of business for providing external interface, and is the traffic assignments task;
The host node is also used to the current state information respectively reported according to multiple Worker nodes, generates the task
Configuration information is simultaneously written in the ZooKeeper cluster, and the configuration information includes that instruction is scheduled to execute the task
The information of Worker node;
The Worker node, if starting for listening to existing in the ZooKeeper cluster and being scheduled to oneself for task
Flume service is executed.
7. system according to claim 6, which is characterized in that the ZooKeeper cluster is specifically used for:
The host node election request that the Master node is initiated based on default trigger event is received, and is made after electing successfully
The Master node is as host node, wherein the default trigger event is one of following event:
The Master node is activated;
Current Master nodes break down as host node.
8. system according to claim 6, which is characterized in that the host node is specifically used for:
According to the operational state of mainframe information that the multiple Worker node respectively reports, determine in the multiple Worker node
The optimal target Worker node of operational state of mainframe;
Instruction is generated by the task schedule to the configuration information of the target Worker node.
9. system according to claim 6, which is characterized in that the host node is specifically also used to:
The operational state of mainframe information and task execution shape respectively reported by the host node according to the multiple Worker node
State information is adjusted the configuration information of the task;
Wherein, the configuration information instruction of the task adjusted carries out capacity reducing processing to being in idle condition for task, and right
Dilatation processing is carried out with the task in stacking states;And
What task immigration load on host computers being on the Worker node of overload was in idle condition to load on host computers
It is executed on Worker node.
10. according to the described in any item systems of claim 6 to 9, which is characterized in that the host node is specifically also used to:
The offline request to the business is received by the external interface;And
The ZooKeeper collection is written to by the offline information of the business and for the offline information of the task of the traffic assignments
In group, so that the Worker node for executing the task stops Flume service.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910048043.XA CN109857558A (en) | 2019-01-18 | 2019-01-18 | A kind of data flow processing method and system |
CA3168286A CA3168286A1 (en) | 2019-01-18 | 2019-09-19 | Data flow processing method and system |
PCT/CN2019/106779 WO2020147330A1 (en) | 2019-01-18 | 2019-09-19 | Data stream processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910048043.XA CN109857558A (en) | 2019-01-18 | 2019-01-18 | A kind of data flow processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109857558A true CN109857558A (en) | 2019-06-07 |
Family
ID=66895175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910048043.XA Pending CN109857558A (en) | 2019-01-18 | 2019-01-18 | A kind of data flow processing method and system |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN109857558A (en) |
CA (1) | CA3168286A1 (en) |
WO (1) | WO2020147330A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110262882A (en) * | 2019-06-17 | 2019-09-20 | 北京思特奇信息技术股份有限公司 | A kind of distributed communication command scheduling system and method |
CN111078396A (en) * | 2019-11-22 | 2020-04-28 | 厦门安胜网络科技有限公司 | Distributed data access method and system based on multitask instances |
WO2020147330A1 (en) * | 2019-01-18 | 2020-07-23 | 苏宁云计算有限公司 | Data stream processing method and system |
CN111447097A (en) * | 2020-04-20 | 2020-07-24 | 国网甘肃省电力公司信息通信公司 | Cloud platform resource scheduling management method and system |
CN113010307A (en) * | 2021-02-25 | 2021-06-22 | 成都库珀区块链科技有限公司 | Multi-chain blockchain browser system and using method thereof |
CN113204418A (en) * | 2021-05-19 | 2021-08-03 | 中国建设银行股份有限公司 | Task scheduling method and device, electronic equipment and storage medium |
CN113254010A (en) * | 2021-07-09 | 2021-08-13 | 广州光点信息科技有限公司 | Visual DAG workflow task scheduling system and operation method thereof |
CN113364864A (en) * | 2021-06-03 | 2021-09-07 | 上海微盟企业发展有限公司 | Server data synchronization method, system and storage medium |
CN114124959A (en) * | 2021-12-06 | 2022-03-01 | 天地伟业技术有限公司 | Data processing device of cloud streaming media service and cloud streaming media cluster |
CN114697328A (en) * | 2022-03-25 | 2022-07-01 | 浪潮云信息技术股份公司 | Method and system for realizing NiFi high-availability cluster mode |
CN114884948A (en) * | 2022-05-05 | 2022-08-09 | 零氪科技(北京)有限公司 | Data processing system |
CN115002122A (en) * | 2022-05-09 | 2022-09-02 | 中盈优创资讯科技有限公司 | Cluster management method and device for data acquisition |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052095B (en) * | 2020-09-11 | 2024-04-19 | 成都锋卫科技有限公司 | Distributed high-availability big data mining task scheduling system |
CN112416550B (en) * | 2020-11-19 | 2024-04-05 | 广州探途网络技术有限公司 | Communication method of crawler scheduling management platform and crawler scheduling management platform system |
CN113342508B (en) * | 2021-07-07 | 2024-08-23 | 湖南快乐阳光互动娱乐传媒有限公司 | Task scheduling method and device |
CN113934782A (en) * | 2021-09-22 | 2022-01-14 | 易联众智鼎(厦门)科技有限公司 | DAG model-based data ETL system and using method |
CN114844799A (en) * | 2022-05-27 | 2022-08-02 | 深信服科技股份有限公司 | Cluster management method and device, host equipment and readable storage medium |
CN117076257B (en) * | 2023-09-14 | 2024-03-05 | 研华科技(中国)有限公司 | Management method, management server and management system based on server cluster |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521044A (en) * | 2011-12-30 | 2012-06-27 | 北京拓明科技有限公司 | Distributed task scheduling method and system based on messaging middleware |
CN103595651A (en) * | 2013-10-15 | 2014-02-19 | 北京航空航天大学 | Distributed data stream processing method and system |
CN106375342A (en) * | 2016-10-21 | 2017-02-01 | 用友网络科技股份有限公司 | Zookeeper-technology-based system cluster method and system |
KR101858565B1 (en) * | 2016-02-19 | 2018-05-16 | 서영준 | Independent parallel processing method for massive data in distributed platform and system of thereof |
CN108241534A (en) * | 2016-12-27 | 2018-07-03 | 阿里巴巴集团控股有限公司 | A kind of task processing, distribution, management, the method calculated and device |
CN108304255A (en) * | 2017-12-29 | 2018-07-20 | 北京城市网邻信息技术有限公司 | Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195962A1 (en) * | 2002-04-10 | 2003-10-16 | Satoshi Kikuchi | Load balancing of servers |
CN105939389A (en) * | 2016-06-29 | 2016-09-14 | 乐视控股(北京)有限公司 | Load balancing method and device |
CN108228393A (en) * | 2017-12-14 | 2018-06-29 | 浙江航天恒嘉数据科技有限公司 | A kind of implementation method of expansible big data High Availabitity |
CN109857558A (en) * | 2019-01-18 | 2019-06-07 | 苏宁易购集团股份有限公司 | A kind of data flow processing method and system |
-
2019
- 2019-01-18 CN CN201910048043.XA patent/CN109857558A/en active Pending
- 2019-09-19 CA CA3168286A patent/CA3168286A1/en active Pending
- 2019-09-19 WO PCT/CN2019/106779 patent/WO2020147330A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521044A (en) * | 2011-12-30 | 2012-06-27 | 北京拓明科技有限公司 | Distributed task scheduling method and system based on messaging middleware |
CN103595651A (en) * | 2013-10-15 | 2014-02-19 | 北京航空航天大学 | Distributed data stream processing method and system |
KR101858565B1 (en) * | 2016-02-19 | 2018-05-16 | 서영준 | Independent parallel processing method for massive data in distributed platform and system of thereof |
CN106375342A (en) * | 2016-10-21 | 2017-02-01 | 用友网络科技股份有限公司 | Zookeeper-technology-based system cluster method and system |
CN108241534A (en) * | 2016-12-27 | 2018-07-03 | 阿里巴巴集团控股有限公司 | A kind of task processing, distribution, management, the method calculated and device |
CN108304255A (en) * | 2017-12-29 | 2018-07-20 | 北京城市网邻信息技术有限公司 | Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020147330A1 (en) * | 2019-01-18 | 2020-07-23 | 苏宁云计算有限公司 | Data stream processing method and system |
CN110262882A (en) * | 2019-06-17 | 2019-09-20 | 北京思特奇信息技术股份有限公司 | A kind of distributed communication command scheduling system and method |
CN111078396B (en) * | 2019-11-22 | 2023-12-19 | 厦门安胜网络科技有限公司 | Distributed data access method and system based on multitasking examples |
CN111078396A (en) * | 2019-11-22 | 2020-04-28 | 厦门安胜网络科技有限公司 | Distributed data access method and system based on multitask instances |
CN111447097A (en) * | 2020-04-20 | 2020-07-24 | 国网甘肃省电力公司信息通信公司 | Cloud platform resource scheduling management method and system |
CN113010307A (en) * | 2021-02-25 | 2021-06-22 | 成都库珀区块链科技有限公司 | Multi-chain blockchain browser system and using method thereof |
CN113010307B (en) * | 2021-02-25 | 2024-04-05 | 库珀科技集团有限公司 | Multi-chain blockchain browser system and application method thereof |
CN113204418A (en) * | 2021-05-19 | 2021-08-03 | 中国建设银行股份有限公司 | Task scheduling method and device, electronic equipment and storage medium |
CN113364864A (en) * | 2021-06-03 | 2021-09-07 | 上海微盟企业发展有限公司 | Server data synchronization method, system and storage medium |
CN113254010A (en) * | 2021-07-09 | 2021-08-13 | 广州光点信息科技有限公司 | Visual DAG workflow task scheduling system and operation method thereof |
CN114124959A (en) * | 2021-12-06 | 2022-03-01 | 天地伟业技术有限公司 | Data processing device of cloud streaming media service and cloud streaming media cluster |
CN114697328A (en) * | 2022-03-25 | 2022-07-01 | 浪潮云信息技术股份公司 | Method and system for realizing NiFi high-availability cluster mode |
CN114884948A (en) * | 2022-05-05 | 2022-08-09 | 零氪科技(北京)有限公司 | Data processing system |
CN115002122A (en) * | 2022-05-09 | 2022-09-02 | 中盈优创资讯科技有限公司 | Cluster management method and device for data acquisition |
Also Published As
Publication number | Publication date |
---|---|
CA3168286A1 (en) | 2020-07-23 |
WO2020147330A1 (en) | 2020-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109857558A (en) | A kind of data flow processing method and system | |
US7340654B2 (en) | Autonomic monitoring in a grid environment | |
CN109343939B (en) | Distributed cluster and parallel computing task scheduling method | |
CN111124806B (en) | Method and system for monitoring equipment state in real time based on distributed scheduling task | |
CN102081554A (en) | Cloud computing operating system as well as kernel control system and method thereof | |
CN111160873B (en) | Running batch processing device and method based on distributed architecture | |
CN111209110B (en) | Task scheduling management method, system and storage medium for realizing load balancing | |
CN111459639B (en) | Distributed task management platform and method supporting global multi-machine room deployment | |
CN106528853A (en) | Data interaction management device and cross-database data interaction processing device and method | |
CN116777182B (en) | Task dispatch method for semiconductor wafer manufacturing | |
WO2023115931A1 (en) | Big-data component parameter adjustment method and apparatus, and electronic device and storage medium | |
CN111414241A (en) | Batch data processing method, device and system, computer equipment and computer readable storage medium | |
CN112437129A (en) | Cluster management method and cluster management device | |
CN103164262A (en) | Task management method and device | |
CN110209497A (en) | Method and system for dynamically expanding and shrinking host resources | |
CN111200518B (en) | Decentralized HPC computing cluster management method and system based on paxos algorithm | |
CN112148462B (en) | Jenkins-based CICD process processing method | |
CN114218329A (en) | Data synchronization method, device, storage medium and computer terminal | |
CN113806080A (en) | Operation memory management method and system based on slurm system | |
CN113515356A (en) | Lightweight distributed resource management and task scheduler and method | |
CN113010307B (en) | Multi-chain blockchain browser system and application method thereof | |
CN113032110A (en) | High-availability task scheduling method based on distributed peer-to-peer architecture design | |
CN115550371B (en) | Pod scheduling method and system based on Kubernetes and cloud platform | |
CN118642845A (en) | Cluster management system, task scheduling method, medium and device | |
CN115858245A (en) | Data backup job scheduling system and backup job scheduling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190607 |