CN110134533B - System and method capable of scheduling data in batches - Google Patents

System and method capable of scheduling data in batches Download PDF

Info

Publication number
CN110134533B
CN110134533B CN201910399131.4A CN201910399131A CN110134533B CN 110134533 B CN110134533 B CN 110134533B CN 201910399131 A CN201910399131 A CN 201910399131A CN 110134533 B CN110134533 B CN 110134533B
Authority
CN
China
Prior art keywords
node
scheduling
layer
nodes
project
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910399131.4A
Other languages
Chinese (zh)
Other versions
CN110134533A (en
Inventor
黄清明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Tianpeng Network Co ltd
Original Assignee
Chongqing Tianpeng Network Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Tianpeng Network Co ltd filed Critical Chongqing Tianpeng Network Co ltd
Priority to CN201910399131.4A priority Critical patent/CN110134533B/en
Publication of CN110134533A publication Critical patent/CN110134533A/en
Application granted granted Critical
Publication of CN110134533B publication Critical patent/CN110134533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/541Client-server

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of big data processing, and particularly relates to a system and a method capable of scheduling data in batches, which comprises the following steps: a framework building unit for building a three-layer framework of the system; the project creating unit is used for acquiring project creating information of secondary development of a user and deploying the multilevel scheduling nodes based on the project creating information; and the operation scheduling unit is used for scheduling the batch tasks with balanced load through the multistage scheduling nodes. The invention not only can schedule data in batch, but also can carry out manual setting intervention, has balanced load during scheduling and has perfect scheduling control strategy.

Description

System and method capable of scheduling data in batches
Technical Field
The invention belongs to the technical field of big data processing, and particularly relates to a system and a method capable of scheduling data in batches.
Background
In the big data era, data is gold, the data is an important asset of the whole society, namely all enterprise groups, and the good data management and good data utilization are important propositions of the whole society. To use good data, it should be managed first. The batch scheduling automation technology is just an important guarantee for managing good data. In a large number of large and small data warehouses, data marts and various data pools, a batch scheduling automation technology is used for orderly and efficiently spreading various works such as the entering, the storage, the cleaning, the filtering, the rough machining, the fine machining and the like of a large amount of data.
Currently, the existing azkaban scheduling tool can solve relatively complex scheduling tasks based on timing tasks, time intervals and relation dependencies. However, the Azkaban scheduling scale is limited, and the defects of inflexible manual participation, unbalanced scheduling load, imperfect scheduling control strategy and the like are caused.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a system and a method capable of scheduling data in batches, which not only can schedule data in batches, but also can perform manual setting intervention, and has balanced load during scheduling and perfect scheduling control strategy.
In a first aspect, the present invention provides a system capable of scheduling data in batches, including:
a framework building unit for building a three-layer framework of the system;
the project creating unit is used for acquiring project creating information of secondary development of a user and deploying the multilevel scheduling nodes based on the project creating information;
and the operation scheduling unit is used for scheduling the batch tasks with balanced load through the multistage scheduling nodes.
The three-layer architecture comprises an application layer, a control layer and a target layer.
Wherein a three-layer architecture of the system is built by adopting a typical C/S mode.
And acquiring project creation information of secondary development of a user through the application layer, and deploying the multilevel scheduling nodes of the control layer according to the project creation information.
In the running process of the project, the control layer performs load-balanced batch task scheduling on the target layer through a multi-stage scheduling node, and the target layer executes a corresponding task program according to the batch task scheduling of the control layer.
The application layer is a client, the control layer is a server, and the target layer is a task program deployed on the ETL server.
The control layer is of a multi-level pyramid structure and is composed of various different types of nodes, the control layer comprises EM nodes, Server nodes and Agent nodes, and the Agent nodes comprise MAGent nodes and SAgent nodes;
the EM node is used for communicating with the application layer, controlling the access authority of the application layer and managing and controlling the effective operation of all nodes;
the Server node is used for respectively communicating with the EM node and the Agent node and finishing scheduling control of the Agent node;
the Agent node is used for communicating with the target layer in a master-slave Agent cascade mode, carrying out load balancing deployment according to the resource use state of the ETL server of the target layer, and distributing tasks to the relatively idle ETL server to execute a task program.
The project creating information comprises project names, nodes in the project operation flow and connection relations among the nodes.
The application layer comprises an Admin module, a Designer module and a Monitor module;
the Admin module is used for managing and setting project names;
the Designer module is used for setting each node in the project operation flow and the connection relation among the nodes;
the Monitor module is used for operating the project and monitoring the operation flow of the project.
Each node is composed of a plurality of component processes with different functions, communication is completed between the nodes through sockets, and communication is completed between the component processes through a message queue mode.
Wherein the component processes include FDC process, DRR process, DAR process, STR process, KIM process, NLS process, SPS process, CPG process, UCD process, EMR process, JMM process, DSY process, and FIM process.
In a second aspect, the present invention further provides an automatic implementation method of batch schedulable data, which is applicable to the system of batch schedulable data according to any one of claims 1 to 7, and is characterized by comprising the following steps:
building a three-layer architecture of the system by adopting a typical C/S mode, wherein the three-layer architecture comprises an application layer, a control layer and a target layer;
acquiring project creation information of secondary development of a user through the application layer, and deploying the multilevel scheduling nodes of the control layer according to the project creation information;
in the running process of a project, the control layer performs load-balanced batch task scheduling on the target layer through a multi-stage scheduling node, and the target layer executes a corresponding task program according to the batch task scheduling of the control layer.
The control layer is of a multi-level pyramid structure and is composed of various different types of nodes, the control layer comprises EM nodes, Server nodes and Agent nodes, and the Agent nodes comprise MAGent nodes and SAgent nodes;
the EM node is used for communicating with the application layer, controlling the access authority of the application layer and managing and controlling the effective operation of all nodes;
the Server node is used for respectively communicating with the EM node and the Agent node and finishing scheduling control of the Agent node;
the Agent node is used for communicating with the target layer in a master-slave Agent cascade mode, carrying out load balancing deployment according to the resource use state of the ETL server of the target layer, and distributing tasks to the relatively idle ETL server to execute a task program.
Each node consists of a plurality of component processes with different functions, communication is completed between the nodes through a Socket, and communication is completed between the component processes through a message queue mode;
the component processes include FDC process, DRR process, DAR process, STR process, KIM process, NLS process, SPS process, CPG process, UCD process, EMR process, JMM process, DSY process, and FIM process.
The embodiment of the invention can not only schedule data in batches, but also perform manual setting intervention, and has balanced load during scheduling and perfect scheduling control strategy.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a block diagram of a system for batch scheduling data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a three-level architecture of the system in accordance with an embodiment of the present invention;
FIG. 3 is a flowchart of an automated implementation method for batch scheduling of data according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
The first embodiment is as follows:
the embodiment provides a system capable of scheduling data in batches, as shown in fig. 1, including:
a framework building unit for building a three-layer framework of the system;
the project creating unit is used for acquiring project creating information of secondary development of a user and deploying the multilevel scheduling nodes based on the project creating information;
and the operation scheduling unit is used for scheduling the batch tasks with balanced load through the multistage scheduling nodes.
A three-layer architecture of the system constructed in this embodiment is shown in fig. 2, where the application layer is a client, the control layer is a server, and the target layer is various task programs deployed on the ETL server. Patent document 201520554128.2 discloses a big data processing platform network architecture, which includes a core layer switch, an application virtualization server, a database cluster, a storage array, a backup server and at least one switch; the application virtualization server, the database cluster, the storage array and the backup server are respectively connected with the core layer switch, the storage array is connected with the switch, and the switch is respectively connected with the application virtualization server and the database cluster. The technical scheme meets the hardware environment required by processing big data; and has openness and expansibility. A large amount of data are mainly stored in a traditional SQL database at present, have very big difference with the NoSQL database that big data technology used, simultaneously because the variety characteristics of data, before using big data platform to handle data, need import the data into big data platform's own storage system, and generally need advance ETL (data warehouse technology) and handle when importing, accomplish processes such as the extraction of all kinds of data, wash, load.
ETL, an abbreviation used in english Extract-Transform-Load, is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end. The traditional ETL tool is provided with a special conversion engine arranged between a data source and a target data warehouse, and the special conversion engine is used for applying all conversion programs.
The application layer in the invention is mainly divided into admin, designer and monitor from the function point of view. The control layer is a multi-level pyramid structure, and the top layer is a service control node to complete various scheduling service controls and provide various operation application services for the client. And the agent layer completes control interaction with the target layer server. In addition, the agent layer can realize scheduling control of the servers deployed in the cluster, realize load balancing and the like in a master-slave agent cascading mode. The target layer is the object controlled by the whole product, such as our ET server, job workstation, etc.
In the embodiment, after a basic three-layer framework is built, a plurality of projects can be created through the application layer, after the projects are created, the control layer performs load-balanced batch task scheduling on the target layer according to the requirements of the running tasks in the running process of the projects, and the target layer executes corresponding task programs according to the scheduling.
The application layer in the embodiment comprises an Admin module, a Designer module, a Monitor module and the like;
the Admin module is used for managing and setting project names;
the Designer module is used for setting each node in the project operation flow and the connection relation among the nodes;
the Monitor module is used for operating the project and monitoring the operation flow of the project.
The project creation information described in this embodiment includes a project name, nodes in the project workflow, and a connection relationship between the nodes. A user creates a specific project operation flow through Admin and Designer, and after creation, a simulation monitoring project operation flow can be performed through a Monitor.
The control layer of this embodiment is a pyramid system, and is formed by a plurality of different types of nodes, where the control layer includes an EM node (i.e., a core node), a Server node (i.e., a control node), and an Agent node (i.e., a proxy node), and the Agent node includes an Agent node (i.e., a master proxy node) and a sangent node (i.e., a slave proxy node). These several different types of nodes have different roles and functions.
The EM node is used for communicating with the application layer, controlling the access authority of the application layer and managing and controlling the effective operation of all nodes;
the Server node is used for respectively communicating with the EM node and the Agent node and finishing scheduling control of the Agent node;
the Agent node is used for communicating with the target layer in a master-slave Agent cascade mode, carrying out load balancing deployment according to the resource use state of the ETL server of the target layer, and distributing tasks to the relatively idle ETL server to execute a task program.
In this embodiment, the nodes complete communication through Socket. In actual operation, each transaction generates a connection action, which not only represents the communication between the client and the core node, but also represents the communication between all nodes. The core nodes are peer-to-peer, each node can initiate a service request to other nodes, and each node is a client and a server. The control layer of this embodiment is a multilayer logic system, and is shown by deploying different nodes in different logic layers, and this multilayer structure is not fixed and unchangeable, and the user can deploy the control layer flexibly according to the scale and the demand of the project, and the whole system can be simple or complex.
Each node of the embodiment is composed of a plurality of component processes with different functions, and the component processes complete communication in a message queue mode. The component processes include an FDC (flow Dispatch core) process, a DRR (Dispatch request router) process, a DAR (Dispatch Answer router) process, a STR (Send Message ToRemoto) process, a KIM (Kernel Integrated manager) process, an NLS (Net Listten) process, an SPS (search Plugin State) process, a CPG (Call Plugin) process, a UCD (user Command deal) process, an EMR (Kernel Event manager And Release) process, a JJob Mutex manager process, a DSY (DataSynchronous) process, And a FIM (flow InstanceMM) process.
In the embodiment, different component processes have different functions, in practical application, a user can select a required component process according to the requirement of a project, and in order to effectively realize synchronous communication and asynchronous communication among the component processes, a request queue and a response queue are logically distributed for each component process on the basis of a physical message queue.
A request queue: and receiving the queue of the request message of other component processes, wherein the current process is a service end for providing service.
The response queue: and receiving the message queue of the response information of other service processes, wherein the current process is the client requesting the service.
Because each process has a request queue and a response queue, each process can provide service and can also request service, and when the service is provided, the component is a service end; when a service is requested, the component is a client. This feature is similar to the inter-core node communication mechanism of the product, and the inter-core node communication is peer-to-peer, and similarly, the intra-node component communication is also peer-to-peer.
In this embodiment, when the ETL server is scheduled, a load balancing mechanism is adopted, and load balancing deployment is to effectively utilize physical resources and improve ETL processing efficiency. It is mainly realized by the agent cascade mode. Load balancing is implemented in relation to a cluster, i.e., within an execution domain formed by a cascade of executing agents. The task deployment on each ETL server is required to be the same for within a cluster. And the control layer automatically distributes the tasks to the relatively idle ETL hosts and executes the task programs according to the resource use condition of the ETL servers in the cluster.
In conclusion, the system can not only schedule data in batches, but also perform manual setting intervention, and has load balance during scheduling and perfect scheduling control strategy.
Example two:
the embodiment provides an automatic implementation method for batch scheduling data, which is suitable for the system for batch scheduling data described in the first embodiment, and includes the following steps:
s1, building a three-layer architecture of the system by adopting a typical C/S mode, wherein the three-layer architecture comprises an application layer, a control layer and a target layer;
s2, acquiring project creation information of user secondary development through an application layer, and deploying the multilevel scheduling nodes of a control layer according to the project creation information;
and S3, in the running process of the project, the control layer performs load-balanced batch task scheduling on the target layer through the multi-stage scheduling nodes, and the target layer executes the corresponding task program according to the batch task scheduling of the control layer.
A three-layer architecture of the system constructed in this embodiment is shown in fig. 2, where the application layer is a client, the control layer is a server, and the target layer is various task programs deployed on the ETL server. In this example.
The application layer is mainly divided into admin, designer and monitor from the function point of view. The control layer is a multi-level pyramid structure, and the top layer is a service control node to complete various scheduling service controls and provide various operation application services for the client. And the agent layer completes control interaction with the target layer server. In addition, the agent layer can realize scheduling control of the servers deployed in the cluster, realize load balancing and the like in a master-slave agent cascading mode. The target layer is the object controlled by the whole product, such as our ET server, job workstation, etc.
In the embodiment, after a basic three-layer framework is built, a plurality of projects can be created through the application layer, after the projects are created, the control layer performs load-balanced batch task scheduling on the target layer according to the requirements of the running tasks in the running process of the projects, and the target layer executes corresponding task programs according to the scheduling.
The application layer in the embodiment comprises an Admin module, a Designer module, a Monitor module and the like;
the Admin module is used for managing and setting project names;
the Designer module is used for setting each node in the project operation flow and the connection relation among the nodes;
the Monitor module is used for operating the project and monitoring the operation flow of the project.
The project creation information described in this embodiment includes a project name, nodes in the project workflow, and a connection relationship between the nodes. A user creates a specific project operation flow through Admin and Designer, and after creation, a simulation monitoring project operation flow can be performed through a Monitor.
The control layer of this embodiment is a pyramid system, and is formed by a plurality of different types of nodes, where the control layer includes an EM node (i.e., a core node), a Server node (i.e., a control node), and an Agent node (i.e., a proxy node), and the Agent node includes an Agent node (i.e., a master proxy node) and a sangent node (i.e., a slave proxy node). These several different types of nodes have different roles and functions.
The EM node is used for communicating with the application layer, controlling the access authority of the application layer and managing and controlling the effective operation of all nodes;
the Server node is used for respectively communicating with the EM node and the Agent node and finishing scheduling control of the Agent node;
the Agent node is used for communicating with the target layer in a master-slave Agent cascade mode, carrying out load balancing deployment according to the resource use state of the ETL server of the target layer, and distributing tasks to the relatively idle ETL server to execute a task program.
In this embodiment, the nodes complete communication through Socket. In actual operation, each transaction generates a connection action, which not only represents the communication between the client and the core node, but also represents the communication between all nodes. The core nodes are peer-to-peer, each node can initiate a service request to other nodes, and each node is a client and a server. The control layer of this embodiment is a multilayer logic system, and is shown by deploying different nodes in different logic layers, and this multilayer structure is not fixed and unchangeable, and the user can deploy the control layer flexibly according to the scale and the demand of the project, and the whole system can be simple or complex.
Each node of the embodiment is composed of a plurality of component processes with different functions, and the component processes complete communication in a message queue mode. The component processes include FDC (flow Dispatch core) process, DRR (Dispatch request router) process, DAR (Dispatch Answer router) process, STR (Send Message ToRemoto) process, KIM (Kernel Integrated manager) process, NLS (Net Listten) process, SPS (SearchPlugin State) process, CPG (Call Plugin) process, UCD (user Command deal) process, EMR (Kernel event manager And Release) process, JJJob Mutex manager process, DSY (DataSynchronous) process, And FIM (flow InstanceMM) process.
In the embodiment, different component processes have different functions, in practical application, a user can select a required component process according to the requirement of a project, and in order to effectively realize synchronous communication and asynchronous communication among the component processes, a request queue and a response queue are logically distributed for each component process on the basis of a physical message queue.
A request queue: and receiving the queue of the request message of other component processes, wherein the current process is a service end for providing service.
The response queue: and receiving the message queue of the response information of other service processes, wherein the current process is the client requesting the service.
Because each process has a request queue and a response queue, each process can provide service and can also request service, and when the service is provided, the component is a service end; when a service is requested, the component is a client. This feature is similar to the inter-core node communication mechanism of the product, and the inter-core node communication is peer-to-peer, and similarly, the intra-node component communication is also peer-to-peer.
In this embodiment, when the ETL server is scheduled, a load balancing mechanism is adopted, and load balancing deployment is to effectively utilize physical resources and improve ETL processing efficiency. It is mainly realized by the agent cascade mode. Load balancing is implemented in relation to a cluster, i.e., within an execution domain formed by a cascade of executing agents. The task deployment on each ETL server is required to be the same for within a cluster. And the control layer automatically distributes the tasks to the relatively idle ETL hosts and executes the task programs according to the resource use condition of the ETL servers in the cluster.
In conclusion, the method can not only schedule data in batches, but also perform manual setting intervention, and has load balance during scheduling and perfect scheduling control strategy.
Those of ordinary skill in the art will appreciate that the systems and method steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed method and system may be implemented in other ways. For example, the division of the above steps is only one logic function division, and there may be another division manner in actual implementation, for example, multiple steps may be combined into one step, and one step or multiple steps may also be split into multiple steps. And part or all of the steps can be selected according to actual needs to achieve the aim of the scheme of the embodiment of the invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (7)

1. A system for batch scheduling data, comprising: a framework building unit for building a three-layer framework of the system; the project creating unit is used for acquiring project creating information of secondary development of a user and deploying the multilevel scheduling nodes based on the project creating information; the operation scheduling unit is used for scheduling the batch tasks with balanced load through the multistage scheduling nodes; the three-layer architecture comprises an application layer, a control layer and a target layer; the control layer adopts a multi-level pyramid structure and is composed of various nodes of different types, the control layer comprises an EM node, a Server node and an Agent node, and the Agent node comprises an MAGent node and a SAgent node; the EM node is used for communicating with the application layer, controlling the access authority of the application layer and managing and controlling the effective operation of all nodes; the Server node is used for respectively communicating with the EM node and the Agent node and finishing scheduling control of the Agent node; the Agent node is used for communicating with a target layer in a master-slave Agent cascade mode, carrying out load balancing deployment according to the resource use state of the ETL server of the target layer and distributing tasks to the relatively idle ETL server to execute a task program; the MAGent node is a master agent node, and the SAgent node is a slave agent node.
2. The system capable of scheduling data in batches according to claim 1, wherein a three-layer architecture of the system is built by adopting a typical C/S mode.
3. The system capable of scheduling data in batches according to claim 2, wherein project creation information of secondary development of a user is acquired through the application layer, and the multi-level scheduling node of the control layer is deployed according to the project creation information.
4. The system of claim 2, wherein during the operation of the project, the control layer performs load-balanced batch task scheduling on the target layer through a multi-stage scheduling node, and the target layer executes a corresponding task program according to the batch task scheduling of the control layer.
5. The system of claim 2, wherein the application layer is a client, the control layer is a server, and the target layer is a task program deployed on the ETL server.
6. An automatic implementation method of batch schedulable data, which is applicable to the system of batch schedulable data of any one of claims 1-5, characterized by comprising the following steps: building a three-layer architecture of the system by adopting a typical C/S mode, wherein the three-layer architecture comprises an application layer, a control layer and a target layer; acquiring project creation information of secondary development of a user through the application layer, and deploying the multilevel scheduling nodes of the control layer according to the project creation information; in the running process of a project, the control layer performs load-balanced batch task scheduling on the target layer through a multi-stage scheduling node, and the target layer executes a corresponding task program according to the batch task scheduling of the control layer; the control layer adopts a multi-level pyramid structure and is composed of various nodes of different types, the control layer comprises an EM node, a Server node and an Agent node, and the Agent node comprises an MAGent node and a SAgent node; the EM node is used for communicating with the application layer, controlling the access authority of the application layer and managing and controlling the effective operation of all nodes; the Server node is used for respectively communicating with the EM node and the Agent node and finishing scheduling control of the Agent node; the Agent node is used for communicating with a target layer in a master-slave Agent cascade mode, carrying out load balancing deployment according to the resource use state of the ETL server of the target layer and distributing tasks to the relatively idle ETL server to execute a task program; the MAGent node is a master agent node, and the SAgent node is a slave agent node.
7. The method for realizing automation of data batch scheduling according to claim 6, wherein each node is composed of a plurality of component processes with different functions, the nodes complete communication through Socket, and the component processes complete communication through a message queue.
CN201910399131.4A 2019-05-14 2019-05-14 System and method capable of scheduling data in batches Active CN110134533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910399131.4A CN110134533B (en) 2019-05-14 2019-05-14 System and method capable of scheduling data in batches

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910399131.4A CN110134533B (en) 2019-05-14 2019-05-14 System and method capable of scheduling data in batches

Publications (2)

Publication Number Publication Date
CN110134533A CN110134533A (en) 2019-08-16
CN110134533B true CN110134533B (en) 2020-04-28

Family

ID=67573989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910399131.4A Active CN110134533B (en) 2019-05-14 2019-05-14 System and method capable of scheduling data in batches

Country Status (1)

Country Link
CN (1) CN110134533B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761046A (en) * 2021-09-13 2021-12-07 中远海运科技股份有限公司 Workflow ETL-based processing method and system
CN114553956B (en) * 2022-01-04 2024-01-09 北京国电通网络技术有限公司 Data transmission method and system based on unified extensible firmware protocol (UEP) middleware

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101951411A (en) * 2010-10-13 2011-01-19 戴元顺 Cloud scheduling system and method and multistage cloud scheduling system
CN101957780A (en) * 2010-08-17 2011-01-26 中国电子科技集团公司第二十八研究所 Resource state information-based grid task scheduling processor and grid task scheduling processing method
CN104239144A (en) * 2014-09-22 2014-12-24 珠海许继芝电网自动化有限公司 Multilevel distributed task processing system
CN109254846A (en) * 2018-08-01 2019-01-22 国电南瑞科技股份有限公司 The dynamic dispatching method and system of CPU and GPU cooperated computing based on two-level scheduler
CN109743390A (en) * 2019-01-04 2019-05-10 深圳壹账通智能科技有限公司 Method for scheduling task, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8447872B2 (en) * 2006-11-01 2013-05-21 Intel Corporation Load balancing in a storage system
CN105703940B (en) * 2015-12-10 2021-08-20 中国电力科学研究院有限公司 Monitoring system and monitoring method for multi-level scheduling distributed parallel computation
US10275206B2 (en) * 2017-01-26 2019-04-30 Bandlab Plug-in load balancing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957780A (en) * 2010-08-17 2011-01-26 中国电子科技集团公司第二十八研究所 Resource state information-based grid task scheduling processor and grid task scheduling processing method
CN101951411A (en) * 2010-10-13 2011-01-19 戴元顺 Cloud scheduling system and method and multistage cloud scheduling system
CN104239144A (en) * 2014-09-22 2014-12-24 珠海许继芝电网自动化有限公司 Multilevel distributed task processing system
CN109254846A (en) * 2018-08-01 2019-01-22 国电南瑞科技股份有限公司 The dynamic dispatching method and system of CPU and GPU cooperated computing based on two-level scheduler
CN109743390A (en) * 2019-01-04 2019-05-10 深圳壹账通智能科技有限公司 Method for scheduling task, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
适应多级调度安全稳定分析资源共享的分布式计算管理平台;方勇杰;《电力系统自动化》;20161210;全文 *

Also Published As

Publication number Publication date
CN110134533A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
Tsaregorodtsev et al. DIRAC: a community grid solution
US7760743B2 (en) Effective high availability cluster management and effective state propagation for failure recovery in high availability clusters
CN103414712B (en) A kind of distributed virtual desktop management system and method
US20170134526A1 (en) Seamless cluster servicing
WO2021147288A1 (en) Container cluster management method, device and system
CN106033373A (en) A method and a system for scheduling virtual machine resources in a cloud computing platform
CN101694709A (en) Service-oriented distributed work flow management system
CN113569987A (en) Model training method and device
CN112667362B (en) Method and system for deploying Kubernetes virtual machine cluster on Kubernetes
CN110838939B (en) Scheduling method based on lightweight container and edge Internet of things management platform
CN110134533B (en) System and method capable of scheduling data in batches
JP2019121240A (en) Workflow scheduling system, workflow scheduling method and electronic apparatus
CN112882828B (en) Method for managing and scheduling a processor in a processor-based SLURM operation scheduling system
CN110661842A (en) Resource scheduling management method, electronic equipment and storage medium
Kijsipongse et al. A hybrid GPU cluster and volunteer computing platform for scalable deep learning
CN110569113A (en) Method and system for scheduling distributed tasks and computer readable storage medium
CN114816694A (en) Multi-process cooperative RPA task scheduling method and device
CN109684028A (en) A kind of method, device and equipment that operating system is separated with user data
CN104484228A (en) Distributed parallel task processing system based on Intelli-DSC (Intelligence-Data Service Center)
Mahato et al. Dynamic and adaptive load balancing in transaction oriented grid service
CN111240824A (en) CPU resource scheduling method and electronic equipment
CN109450913A (en) A kind of multinode registration dispatching method based on strategy
CN109634749B (en) Distributed unified scheduling method and device
CN102214094A (en) Executing operations via asynchronous programming model
US20100122254A1 (en) Batch and application scheduler interface layer in a multiprocessor computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant