CN110278279A

CN110278279A - A kind of big data of dynamic resource scheduling mechanism dispatches development platform and method offline

Info

Publication number: CN110278279A
Application number: CN201910564917.7A
Authority: CN
Inventors: 张梦龙; 裴宝山; 李翔; 祁洁
Original assignee: Suning Consumption Finance Co Ltd
Current assignee: Suning Consumption Finance Co Ltd
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2019-09-24

Abstract

The present invention relates to a kind of big datas of dynamic resource scheduling mechanism to dispatch development platform offline, including client, resource allocation host node module and several task execution managers, the client is connect with resource allocation host node module and several task execution managers respectively, each task execution manager is connected with resource allocation host node module, it is communicated for timing with resource allocation host node module, current server resource service condition is reported, and periodically updates the resource using information of oneself into zookeeper node.Using present invention significantly reduces the Single Point of Faliure risks of host node.The basic load for solving the problems, such as working node in old frame the operation conditions of subprogram but can not be handled under monitor task.If the machine of task run goes wrong, other machines can be distributed to and restart task, greatly improve the fault-tolerance of platform.

Description

A kind of big data of dynamic resource scheduling mechanism dispatches development platform and method offline

Technical field

The present invention relates to scheduling of resource technical fields, and in particular to a kind of big data of dynamic resource scheduling mechanism is adjusted offline Spend development platform and method.

Background technique

In the case where Internet company's high speed development, the calculating demand based on data-intensive applications is continuously increased, public The considerations of department is for factors such as the resource utilizations, O&M cost, data sharing of server hardware, it is desirable to will be various types of Calculating demand be all deployed in a public cluster, allow the resources of their shared clusters, and unified use is carried out to resource, Each task is isolated using certain resource isolation mechanism simultaneously, be just born scheduling development platform in this way.And it is traditional Scheduling development platform perform poor in terms of these three in scalability, reliability, resource utilization.

There can be a host node (Master) in traditional scheduler development platform and be provided simultaneously with resource management and operation control Function processed, this part become a maximum bottleneck of system.This node completes too many task, causes excessive resource and disappears Consumption will cause very big memory overhead when user program is very more, it is potential for, also increase user program distribution Risk, this is also that industry generally sums up this framework and can only support the upper limit of 4000 node hosts.

From node (Slave), the simple quantity by task is too simple as the expression of resource, does not account for To the occupancy situation of CPU, memory, if together with the task schedule of two big memory consumption, just it is easy to appear memories Overflow problem.

To solve the above-mentioned problems, while meeting the seamless migration of business scenario, we are to having carried out function on host node For two components, the two components are resource management and task schedule for separation.Resource manager (ResourceMaster) is complete Office manages the computational resource allocation of all application programs, and the Executor of each application is responsible for corresponding scheduling and coordinates.This The design of sample substantially reduces the resource consumption of Master, and allow monitor each subroutine state program distribution. For resource expression as unit of memory and cpu, than before number of tasks distribution it is more reasonable.

Summary of the invention

It dispatches and opens offline technical problem to be solved by the invention is to provide a kind of big data of dynamic resource scheduling mechanism Send out platform and method.

In order to solve the above technical problems, the technical solution of the present invention is as follows: providing a kind of big number of dynamic resource scheduling mechanism According to offline scheduling development platform, innovative point is: including client, resource allocation host node module and several task execution pipes Device is managed, the client is connect with resource allocation host node module and several task execution managers respectively, each task It executes manager to connect with resource allocation host node module, be communicated for timing with resource allocation host node module, report is worked as Preceding server resource service condition, and periodically update the resource using information of oneself into zookeeper node.

Further, each task execution manager includes several resource management containers, each resource pipe Reason container includes a task performer module, the task performer module by be connected with resource allocation host node module come Task ID is found from resource allocation host node module, and makes reality of the task execution manager by task ID to task performer When monitoring and user in the operating status of web client real time inspection to task, the task performer module is also and client End is connected to client exposed interface, and the interface is used to carry out submission task, kill task and check task status.

Further, the resource allocation host node module includes scheduler and task manager, and Scheduler module is used for The starting of clocked flip task, resource manager is for being divided the CPU, memory and bandwidth resources of the server in cluster Match.

Further, each task execution manager includes that the quantity of resource management container passes through the strategy of scheduler To determine.

In order to solve the above technical problems, technical solution of the present invention additionally provides a kind of big number of dynamic resource scheduling mechanism According to offline scheduling development approach, innovative point is: specifically includes the following steps:

(1) client creates corresponding multiple tasks according to customer service demand, and task flow is added in multiple tasks, together When configuration task warning strategies and case mechanism, the scheduling time of configuration task stream and priority, and to resource allocation master Node module sends the configuration information of each task；

(2) resource allocation host node module is to distribute the appearance for having resource information respectively from the received each task of client Device, i.e. resource management container, and store tasks ID, resource allocation host node module and each task execution are distinguished for each task Manager communication, allows task execution manager to start corresponding task performer module；

(3) after the starting of task performer module, to resource allocation host node Module registers；

It (4) is to appoint to resource allocation host node module after task performer module succeeds to resource allocation host node Module registers Resource is applied for and is got in business；

(5) after resource allocation host node module receives the resource bid of task performer module, pass through internal resource management Device is to task performer module assignment task, after the application of task performer module is to resource, with corresponding task execution management Device communication makes corresponding task execution manager starting user application, that is, starts execution task；

(6) each task performer module by RPC agreement to task execution manager report task execution state and into Degree, so that task execution manager monitors the operating status of each task at any time, to restart task in mission failure；

By resource allocation host node after task performer module is finished current task or current task execution failure Module is nullified, and shows task action result in the task daily record of client, the appointing by step (1) configuration if mission failure Business warning strategies notify business personnel.

Further, task performer module to resource allocation host node module application and gets money in the step (4) The mode in source is by the way of poll, and task performer module and resource allocation host node module pass through RPC protocol communication, institute RPC communication is stated to realize using Apache Thrift.

Further, the process of task execution manager starting user application includes: that task is held in the step (5) Line supervisor is that user program has configured running environment, and the running environment includes environmental variance and binary program, by program Start command is write in an executable file, starts user application by running the executable file.

Further, in the step (6) during user program operation, task performer module passes through at any time RPC agreement shows the operating status of application program to user.

The present invention compared to the prior art, the beneficial effects are as follows:

(1) ResourceMaster separates existing two components of host node, the two functions are resource management and task schedule / monitoring.The distribution of all application program computing resources of new resource manager global administration, the Executor of each application It is responsible for corresponding scheduling and coordinates.Separation function significantly reduces the Single Point of Faliure risk of host node.

(2) WorkerManager function is more single-minded, is just responsible for the maintenance of program containers state, and to ResourceMaster keeps heartbeat, and Executor is responsible for all working in a task life cycle, similar old frame Middle Slave node.Although note that each task (not being each) has an Executor, it may operate in On machine other than ResourceMaster.The basic load for solving working node in old frame is subprogram under monitor task Operation conditions but the problem of can not handle.If the machine of task run goes wrong, other machines can be distributed to and restarted Task greatly improves the fault-tolerance of platform.

Detailed description of the invention

It, below will be to needed in the embodiment in order to more clearly illustrate the technical solution in the embodiment of the present invention Attached drawing is simply introduced, it should be apparent that, the accompanying drawings in the following description is only some embodiments recorded in the present invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.Fig. 1 is that a kind of big data of dynamic resource scheduling mechanism of the invention dispatches the system construction drawing of development platform offline.

Fig. 2 is the system construction drawing of the task execution manager 1 in Fig. 1.

Specific embodiment

Technical solution of the present invention will be clearly and completely described by specific embodiment below.

The present invention provides a kind of big datas of dynamic resource scheduling mechanism to dispatch development platform offline, and specific structure is such as Shown in Fig. 1, including client, resource allocation host node module and several task execution managers, several task execution managers Be divided into task execution manager 1, task execution manager 2 ..., client respectively with resource allocation host node module and several The connection of task execution manager, each task execution manager are connected with resource allocation host node module, for timing It is communicated with resource allocation host node module, reports current server resource service condition, and the resource of oneself is periodically used into letter Breath is updated into zookeeper node.

Several task execution managers of the invention are several servers under resource allocation host node module management, each , the present invention chooses the task execution manager 1 in Fig. 1 and is illustrated, as shown in Fig. 2, task for the effect of a server Executing manager 1 includes several resource management containers, and each resource management container includes a task performer module, is appointed Business executor module finds task ID from resource allocation host node module by being connected with resource allocation host node module, and makes Task execution manager is obtained to arrive the real time monitoring of task performer and user in web client real time inspection by task ID The operating status of task, the task performer module are also connected to client exposed interface with client, interface be used into Row submission task, kill task and check task status.

Resource allocation host node module of the invention includes scheduler and task manager, and Scheduler module is for periodically touching The starting of hair task, resource manager is for the CPU, memory and bandwidth resources of the server in cluster to be allocated.It is each A task execution manager includes that the quantity of resource management container is determined by the strategy of scheduler.

The big data that technical solution of the present invention additionally provides a kind of dynamic resource scheduling mechanism dispatches development approach offline, Specifically includes the following steps:

It (4) is to appoint to resource allocation host node module after task performer module succeeds to resource allocation host node Module registers Business application and get resource, wherein task performer module is to resource allocation host node module application and gets the mode of resource By the way of poll, and task performer module and resource allocation host node module, by RPC protocol communication, the RPC is logical Courier is realized with Apache Thrift；

(5) after resource allocation host node module receives the resource bid of task performer module, pass through internal resource management Device is to task performer module assignment task, after the application of task performer module is to resource, with corresponding task execution management Device communication makes corresponding task execution manager starting user application, that is, starts execution task, wherein task execution pipe It is that user program has configured running environment, the fortune that the process of reason device starting user application, which includes: task execution manager, Row environment includes environmental variance and binary program, and program start command is write in an executable file, by running institute State executable file starting user application；

(6) each task performer module by RPC agreement to task execution manager report task execution state and into Degree so that task execution manager monitors the operating status of each task at any time, to restart task in mission failure, with During family program is run, task performer module passes through the operating status that RPC agreement shows application program to user at any time；

(7) by the main section of resource allocation after task performer module is finished current task or current task execution failure Point module is nullified, and shows task action result in the task daily record of client, by step (1) configuration if mission failure Task warning strategies notify business personnel.

Embodiment described above is only that the preferred embodiment of the present invention is described, not to design of the invention It is defined with range, without departing from the design concept of the invention, ordinary engineering and technical personnel is to this hair in this field The all variations and modifications that bright technical solution is made should all fall into protection scope of the present invention, claimed skill of the invention Art content is all documented in technical requirements book.

Claims

1. a kind of big data of dynamic resource scheduling mechanism dispatches development platform offline, it is characterised in that: including client, resource Distribute host node module and several task execution managers, the client respectively with resource allocation host node module and several Business executes manager connection, and each task execution manager connects with resource allocation host node module, be used for periodically with The communication of resource allocation host node module, reports current server resource service condition, and periodically by the resource using information of oneself It updates in zookeeper node.

2. a kind of big data of dynamic resource scheduling mechanism according to claim 1 dispatches development platform, feature offline Be: each task execution manager includes several resource management containers, and each resource management container includes One task performer module, the task performer module with resource allocation host node module by being connected come from resource allocation master Node module finds task ID, and makes real time monitoring and use of the task execution manager by task ID to task performer In the operating status of web client real time inspection to task, the task performer module is also connected with client to visitor at family Family end exposed interface, the interface are used to carry out submission task, kill task and check task status.

3. a kind of big data of dynamic resource scheduling mechanism according to claim 1 dispatches development platform, feature offline Be: the resource allocation host node module includes scheduler and task manager, and Scheduler module is used for clocked flip task Starting, resource manager is for the CPU, memory and bandwidth resources of the server in cluster to be allocated.

4. a kind of big data of dynamic resource scheduling mechanism according to claim 2 dispatches development platform, feature offline Be: each task execution manager includes that the quantity of resource management container is determined by the strategy of scheduler.

5. a kind of big data of dynamic resource scheduling mechanism dispatches development approach offline, it is characterised in that: specifically include following step It is rapid:

6. a kind of big data of dynamic resource scheduling mechanism according to claim 5 dispatches development approach, feature offline Be: task performer module to resource allocation host node module application and is got the mode of resource and is used in the step (4) The mode of poll, and task performer module and resource allocation host node module, by RPC protocol communication, the RPC communication makes It is realized with Apache Thrift.

7. a kind of big data of dynamic resource scheduling mechanism according to claim 5 dispatches development approach, feature offline Be: it is to use that the process of task execution manager starting user application, which includes: task execution manager, in the step (5) Family program has configured running environment, and the running environment includes environmental variance and binary program, and program start command is write In one executable file, start user application by running the executable file.

8. a kind of big data of dynamic resource scheduling mechanism according to claim 5 dispatches development approach, feature offline Be: in the step (6) during user program operation, task performer module passes through RPC agreement to user at any time Show the operating status of application program.