CN106293933A

CN106293933A - A kind of cluster resource configuration supporting much data Computational frames and dispatching method

Info

Publication number: CN106293933A
Application number: CN201511000709.2A
Authority: CN
Inventors: 张京梅
Original assignee: Beijing Dian Zan Science And Technology Ltd
Current assignee: Beijing Dian Zan Science And Technology Ltd
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2017-01-04

Abstract

The invention discloses a kind of cluster resource configuration supporting much data Computational frames and dispatching method, comprise the following steps: by all of calculating resource supplying of concluding the business of main control node collection calculating node to Computational frame scheduler, corresponding Computational frame scheduler decide whether accept resource and use with contract mode of doing business；If Computational frame accepts the resource of distribution, by Computational frame self distributed scheduling, distribution of computation tasks calculated resource to corresponding and notify main control node, starting corresponding Computational frame executor and perform calculating task；If Computational frame refusal accepts, redistribute resource, continue to send resource transaction information to Computational frame；Multiple calculating resource type is carried out fine granularity distributional equity scheduling, and the resource distribution to Computational frame is determined by the resource that stresses of this framework, and the accounting stressing resource that each Computational frame obtains should be the most identical.Improve the overall resource utilization of cluster and calculate reliability of service/extensibility.

Description

A kind of cluster resource configuration supporting much data Computational frames and dispatching method

Technical field

The invention belongs to computing cluster calculates configuration and the dispatching method of resource, support much numbers more particularly to one Configure and dispatching method according to the cluster resource of Computational frame.

Background technology

Along with the development of the industry such as cloud computing, big data, increasing data center is established, and they need more Effective manner is come for data center cost-effective, as the data center of Facebook, Google and Amazon Company is the most quick Expansion, they, also in the technology of searching, help them reduce the construction of data center and safeguard renewal cost.

Data center (Data Center) is the particular device network of global collaboration, is used on internet basis Transmit on facility, accelerate, show, calculate, store data message.Data center's major part electronic component is all by low-voltage DC Source drives operation.The physical problem that data center faces is server itself and applies to other for connecting these servers The cable of environment.

Cluster refers to that trunked communication system is a kind of computer system, and it passes through one group of loose integrated computer software And/or hardware couples together the evaluation work that the most closely cooperated.In some sense, they can be counted as one Computer.Single computer in group system is commonly referred to node, is generally connected by LAN, but also has other possibility Connected mode.Cluster computer is commonly used to improve the calculating speed of single computer and/or reliability.Generally cluster Computer is than single computer, and such as work station or the supercomputer ratio of performance to price are much higher.

Big data Computational frame is for processing operation and the programming framework of the distributed computing system of big data, such as, Storm is for processing high speed, the distributed real time computation system of large data stream.Reliable real time data is with the addition of for Hadoop Process function;Spark have employed internal memory and calculates.From the batch processing of many iteration, it is allowed to load data into internal memory and repeatedly inquire about, The multiple calculating normal forms such as additionally fused data warehouse, stream process and graphics calculations.Spark builds on HDFS, can be with Hadoop well combines.The batch processing of Hadoop user's mass data and off-line data process, and are that current big data calculate mark One of accurate, it is used in current a lot of commercial systems for applications.Can even destructuring number the most integrated structured, semi-structured According to collection.

The Intel Virtualization Technology of current popular can allow multiple application or virtual machine share a machine to improve server money The utilization rate in source.But this shared meeting brings resource contention, and then the performance of interference application program, affects application on site Response time.But quickly service response time is to weigh the key index of service quality, it is to allow user be satisfied with, to keep user here Key.Therefore, this method will certainly affect CSAT, reduces service quality.

Current data center, in order to ensure service quality, use excess to provide the mode of resource, but sacrifices resource profit By rate.The wasting of resources shows as two kinds of forms, and one is that crucial application on site monopolizes data center.Such as use in data The heart runs certain or certain several application on site specially, and other job runs are in other data centers, to reduce at line service Interference.Another kind is to exaggerate resource requirement.

Computing cluster in data center has become as the main calculating platform that big data are relied on, along with Distributed Calculation Development, various big data Computational frames are made to solve different traffic issues, are such as suitable for large-scale off-line batch processing Hadoop, is suitable for the Storm that real-time streams calculates, and the proposition of these big data frameworks solves the base of Distributed Calculation for developer This requirement, including expansible and fault-tolerant.In order to adapt to business innovation, new Computational frame continues to produce, enterprise and tissue Need on same computing cluster, run multiple Computational frame, by the demand of the combination adaptation business of multiple Computational frames.

The solution of existing shared computing cluster mainly has two kinds:

1) the calculating resource to data center carries out static partition, and the computing cluster of each subregion is specified and run a kind of calculation block Frame, such as Hadoop cluster, Spark cluster, Storm cluster etc.；

2) by cloud computing architecture i.e. Service Management all of calculating resource, it is each calculation block by Intel Virtualization Technology Frame provides one group of virtual machine, such as KVM.

Above scheme has the disadvantage in that

1) demand calculating resource is laid particular stress on by different Computational frames is different, and static partition causes overall resource utilization low Under, autgmentability and reliability are low, and maintenance cost is high；

2) the specific dispatching requirement of enterprise customer is not accounted for；

3) following new Computational frame cannot be supported；

4) not accounting for computing capability and the position optimization of data storage, different Computational frames cannot share same local number According to source, network transport load is higher, it is impossible to allow calculating actively find data to improve data access efficiency；

The underlying cause causing these shortcomings is, above static partition or the scheme of fictitious host computer and existing big data Computational frame there are differences from the granularity that distributed computing resource distributes, and framework generally uses fine-grained resource Share Model, a single calculating node can run multiple calculating task simultaneously and visit to improve resource utilization and data The efficiency asked.These Computational frames are all stand-alone developments, and existing scheme cannot realize fine granularity between different Computational frames Resource-sharing.

Summary of the invention

For above-mentioned technical problem, it is desirable to provide a kind of cluster resource supporting much data Computational frames configures With dispatching method, by one group of unified interface, the calculating resource of cluster can be carried out by different big data Computational frames Access, realize between different Computational frame fine-grained shared to calculating resource by the way of dynamic distribution and contract transaction, The extendible method of salary distribution can the business demand of adaptive different enterprises.

For reaching above-mentioned purpose, the technical scheme is that

A kind of cluster resource configuration supporting much data Computational frames and dispatching method, it is characterised in that comprise the following steps:

S01: add a corresponding Computational frame scheduler for each Computational frame and be deployed to whole system, passing through master control The all of calculating resource supplying of concluding the business of node processed collection calculating node, to Computational frame scheduler, is adjusted by corresponding Computational frame Degree device decides whether accept resource and use in the way of contract transaction；

S02: if Computational frame accepts the resource of distribution, then rise to second layer allocation schedule, by Computational frame self Distributed scheduling calculates distribution of computation tasks resource to corresponding and notify main control node, then is notified phase by main control node The node that calculates answered starts corresponding Computational frame executor to perform calculating task；

S03: if Computational frame refusal accepts the calculating resource of current distribution, then main control node re-starts resource point Join, continue to send to Computational frame the information of resource transaction；

S04: multiple calculating resource type is carried out fine granularity distributional equity scheduling, the resource of each Computational frame is distributed by The resource that stresses of this framework determines, in the accounting of the various calculating resources that each Computational frame obtains, stresses the accounting of resource Lion's share should be occupied, and the accounting stressing resource that each Computational frame obtains should be the most identical.

Preferably, described step S04 includes:

S11: inquire about registered Computational frame scheduler, the calculating resource vector of single calculating required by task, and in vector Each Resource Calculation accounting in all resources of cluster；

S12: the accounting of all resources is ranked up, wherein accounting maximum for stressing resource, when the calculation block having new registration During frame, repeat step S11；Otherwise continue executing with；

S13: calculate the allocated accounting stressing resource of each Computational frame, to stressing the sequence of resource accounting, minimum to accounting Computational frame carry out resource distribution, when the whole resources needed for this Computational frame all meet, this Computational frame remove also Carry out next round distribution；

S14: repeat step S13, until PC cluster resource is all assigned.

Compared with prior art, the invention has the beneficial effects as follows:

1, the present invention makes multiple big data Computational frame can share the meter of cluster by bilayer scheduling architecture and contract transaction Calculate resource, it is achieved that the dynamic distribution of cluster resource, ensure that new by reusing the existing distributed scheduling of Computational frame The support of Computational frame, makes the calculating demand quilt of different Computational frames towards fine granularity resource distributional equity dispatching method Meet as far as possible and improve the resource utilization that cluster is overall, thus improve the whole efficiency of data center.

2, the method can give different Computational frames with the calculating resource in the distribution cluster of dynamic high-efficiency, improves cluster Overall resource utilization and calculating reliability of service/extensibility.

Accompanying drawing explanation

Fig. 1 be the present invention support much data Computational frames cluster resource configuration with dispatching method across scheduling architecture Figure；

Fig. 2 is that the present invention supports the cluster resource configuration of much data Computational frames and the resource distribution sequential chart of dispatching method；

Fig. 3 is that the present invention supports the cluster resource configuration of much data Computational frames and the scheduling flow figure of dispatching method.

Detailed description of the invention

For making the object, technical solutions and advantages of the present invention of greater clarity, below in conjunction with detailed description of the invention and Accompanying drawing, the present invention is described in more detail.It should be understood that these describe the most exemplary, and it is not intended to limit the present invention Scope.Additionally, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring this Bright concept.

Embodiment:

Technical scheme mainly includes two aspects:

1) based on master/slave double-deck scheduling mechanism and the scheduling architecture of contract transaction

As it is shown in figure 1, there is the existing distributed big data Computational frame of N kind to need to share PC cluster resource, need for each Planting Computational frame add a corresponding Computational frame scheduler and be deployed in whole system, this scheduler is responsible for and main control The resource distribution module of node carries out resource contract transaction, determines to accept or the meter of refusal distribution according to the requirement of Computational frame Calculate resource；

There is K main control node, service be provided by the way of load balancing, each node comprises resource distribution module, It is responsible for collecting the resource service condition from each calculating node, and corresponding resource supplying to each Computational frame is dispatched Device；

Having M to calculate node, each calculating node is responsible for reporting local resource service condition, and starts corresponding as required Computational frame executor performs the calculating task of Computational frame.

As in figure 2 it is shown, present invention employs double-deck scheduling mechanism, ground floor is collected by main control node and is calculated node institute The concluded the business calculating resource supplying having is to Computational frame scheduler, by corresponding Computational frame scheduler in the way of contract transaction Decide whether accept resource and use.If Computational frame accepts the resource of distribution, then rise to second layer allocation schedule, pass through The distributed scheduling of Computational frame self calculates distribution of computation tasks resource to corresponding and notify main control node, then by leading Control the corresponding Computational frame executor calculating node startup corresponding of node notice and perform calculating task；If Computational frame Refusal accepts the calculating resource of current distribution, then main control node re-starts distribution resource, continues to send to Computational frame The information of resource transaction.

) towards fine granularity resource distributional equity dispatching method

In a shared cluster, the requirement of resource is given priority to by different big data Computational frames, and some needs are substantial amounts of Disk and network, calculating based on internal memory of having needs substantial amounts of physical memory, have to belong to computation-intensive needs substantial amounts of CPU.Consider to dispatch a fine granularity distributional equity for multiple calculating resource type, the resource to each Computational frame Distribution should be determined by the resource that stresses of this framework, and the accounting of the various calculating resources that each Computational frame obtains (collects relatively Group's aggregate resource) in, the accounting stressing resource should occupy lion's share.In view of fairness, each Computational frame obtains The accounting stressing resource should be the most identical.

The flow chart of this algorithm is as shown in Figure 3:

S14: repeat step S13, until PC cluster resource is all assigned.

It should be appreciated that the above-mentioned detailed description of the invention of the present invention is used only for exemplary illustration or explains the present invention's Principle, and be not construed as limiting the invention.Therefore, that is done in the case of without departing from the spirit and scope of the present invention is any Amendment, equivalent, improvement etc., should be included within the scope of the present invention.Additionally, claims purport of the present invention Whole within containing the equivalents falling into scope and border or this scope and border change and repair Change example.

Claims

1. the cluster resource configuration supporting much data Computational frames and dispatching method, it is characterised in that include following step Rapid:

The cluster resource configuration of support the most according to claim 1 much data Computational frame and dispatching method, its feature Being, described step S04 includes:

S14: repeat step S13, until PC cluster resource is all assigned.