CN107038069B - Dynamic label matching DLMS scheduling method under Hadoop platform - Google Patents

Dynamic label matching DLMS scheduling method under Hadoop platform Download PDF

Info

Publication number
CN107038069B
CN107038069B CN201710181055.0A CN201710181055A CN107038069B CN 107038069 B CN107038069 B CN 107038069B CN 201710181055 A CN201710181055 A CN 201710181055A CN 107038069 B CN107038069 B CN 107038069B
Authority
CN
China
Prior art keywords
task
node
label
nodes
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710181055.0A
Other languages
Chinese (zh)
Other versions
CN107038069A (en
Inventor
毛韦
竹翠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710181055.0A priority Critical patent/CN107038069B/en
Publication of CN107038069A publication Critical patent/CN107038069A/en
Application granted granted Critical
Publication of CN107038069B publication Critical patent/CN107038069B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The invention discloses a DLMS (dynamic label matching) scheduling method under a Hadoop platform, belongs to the field of computer software, and provides a scheduler for dynamically matching a node performance label (hereinafter referred to as a node label) and an operation type label (hereinafter referred to as an operation label) aiming at the problems of large performance difference, resource allocation randomness and overlong execution time of a Hadoop cluster node. The nodes are initially classified and given original node labels, the nodes detect performance indexes of the nodes to generate dynamic node labels, the jobs are classified according to part of running information to generate job labels, and the resource scheduler allocates the node resources to the jobs corresponding to the labels. The experimental result shows that the operation execution time is greatly shortened compared with that of the self-contained scheduler in the YARN.

Description

Dynamic label matching DLMS scheduling method under Hadoop platform
Technical Field
The invention belongs to the field of computer software, and relates to design and implementation of a dynamic label matching DLMS scheduling method based on a Hadoop platform.
Background
In the early Hadoop version, because the resource scheduling management and the MapReduce framework are integrated in one module, the decoupling performance of codes is poor, the codes cannot be well expanded, and various frameworks are not supported. The Hadoop open-source community design realizes a new generation Hadoop system with a brand-new architecture, the system is Hadoop2.0 version, and a new resource scheduling framework, namely a new generation of YARN (Hadoop system) is constructed by extracting and scheduling resources. As is well known, a proper scheduling algorithm under a certain determined environment can meet the operation request of a user and effectively improve the overall performance of the Hadoop operation platform and the resource utilization rate of the system. Default to three schedulers in YARN: first-in-first-out (fifo), fair Scheduler (FairScheduler), and computing power Scheduler (Capacity Scheduler). The default of Hadoop is a fifo scheduler, the algorithm adopts a first-in first-out scheduling strategy, is simple and easy to implement, is not beneficial to the execution of short jobs, and does not support shared cluster and multi-user management; the fair scheduling algorithm proposed by Facebook considers the difference between different users and the configuration requirements of the job resources, and supports the users to fairly share the resources of the cluster, but the configuration strategy of the job resources is not flexible enough, so that the resource waste is easily caused, and the job preemption is not supported; the computing power scheduling algorithm proposed by Yahoo supports multi-user sharing of multiple queues, is flexible in computing power, but does not support job preemption and is easy to fall into local optimization.
However, in actual enterprise production, as the data volume of the enterprise increases, new nodes are added to the cluster every year, but the performance difference of the cluster nodes is significant, and the heterogeneous cluster is common in the enterprise production environment. It is assumed that if a machine learning task with a large calculation amount is distributed to a machine node with a poor CPU calculation capability, the overall execution time of the job is obviously affected. The invention provides a resource scheduling method (DLMS) with node performance and job category label dynamically matched, wherein a machine with better CPU performance is attached with a CPU label, a machine with better disk IO performance is attached with an IO label or a common label common to the machine with better disk IO performance, the job can be attached with a CPU label, an IO label task or a common label according to classification, then the job enters different label queues, and the scheduler allocates resources of corresponding label nodes to corresponding label jobs as much as possible, so that the operation time of the job can be reduced, the resource utilization rate of a system is improved, and the overall efficiency of the system is improved.
Disclosure of Invention
The dispatching method provided by the invention initially classifies the cluster nodes and gives corresponding labels. Before sending heartbeat, the NodeManager performs self detection and dynamically adjusts an original label, classifies the operation by using a machine learning classification algorithm and endows the operation with a corresponding label, dynamically realizes the sequencing of the operation according to attributes such as operation priority, operation waiting time and the like set by a user, and allocates resources of the corresponding label to the operation in a corresponding label queue.
The scheduling method provided by the invention mainly comprises the following modules:
(1) original classification of cluster nodes and dynamic classification label thereof
The cluster nodes need to be initially classified firstly, and classification is carried out according to the performance of CPUs and disks IO of the nodes. Each node in the cluster needs to independently run a task of a specified type and record the time of the node for running the type of operation, and the nodes are divided into CPU type nodes, disk IO type nodes and common type nodes according to the size relation between the time of the node for running a single task and the average value of the running time of all the nodes in the cluster.
In the running process of the cluster nodes, if the load is overlarge due to the fact that a part of jobs are run by one node, the label of the node is degraded, and the node is directly degraded into a common node. A node initial label is a CPU type label, a CPU type task is operated in the node, although partial resources of the node are not used, the performance advantage of the CPU of the node in the environment is lost at the moment, in order to avoid the situation, a dynamic label method is adopted, the CPU and IO utilization rate of the node machine are dynamically detected when the NodeManager sends heartbeat to a ResourceManager, if the utilization rate exceeds a threshold value, the node label is pasted with a common label, and the detection is needed once when the heartbeat is sent each time, so that the node dynamic label is realized. This threshold may be self-configurable in a configuration file, and may be referred to by a system default if not configured by the user.
(2) Obtaining and returning Map execution information
The Hadoop job is generally divided into a Map phase and a Reduce phase, the number of large job maps is usually hundreds or more, one job is mainly time spent on the calculation of the Map phase, but each Map is identical to execution logic, so that running information of the first Map process of the job running is collected, the information is transmitted to a scheduler when the NodeManager sends a heartbeat to a ResourceManager, and the scheduler classifies the job according to the transmitted information.
In an enterprise production environment, jobs with the same content logic are operated every day, namely, a user knows a label to which the job should belong, a job type label is set for the job in a command line or a code, a scheduler checks the job in scheduling, and if the user labels the job, a job classification link is omitted and the job is directly scheduled.
(3) Multi-priority queue
In order to meet the requirements of different users and prevent the phenomenon of hunger of small jobs, a job priority scheme is adopted. 5 queues are newly built in the scheduler: the system comprises an original queue, a waiting priority queue, a CPU priority queue, an IO priority queue and a common priority queue. The method comprises the steps that a user submits a job, the job part Map is firstly operated and the operation information of the part Map is collected, then the job enters a waiting priority queue to wait for the operation information of the Map to be returned and classified, and finally the job enters a queue corresponding to a label according to the classification type of the job.
(3) Job classification
The data needs to be preprocessed before being classified, and the data preprocessing refers to that some processing is performed on the data in a previous period. Data preprocessing techniques have been developed to improve the quality of data mining. There are several methods for data preprocessing: data cleaning, data integration, data transformation and data specification. The data processing technologies are used before data mining, so that the quality of a data mining mode is greatly improved, and the time required by actual mining is reduced. The data preprocessing is mainly in the aspect of data normalization. The data normalization is to linearly transform each variable data to a new scale, and after the transformation, the minimum value of the variable is 0 and the maximum value is 1, so that all the variable data are ensured to be less than or equal to 1.
And a naive Bayes classifier which is simple, common and good in classification effect is selected for classification in the aspect of operation classification. If the user has added the type of job in the command line and task code, this step is omitted and the queue is entered directly to wait for the allocation of resources.
(4) Data locality
One principle that is followed in Hadoop is that "moving a computation is better than moving data," moving to a compute node that places data is more cost effective and better performing than moving data to a compute node. The invention adopts a delay degradation scheduling strategy with respect to data locality.
The beneficial effects are as follows:
1. the invention provides a dynamic label matching scheduling method aiming at a heterogeneous cluster environment, which is characterized in that nodes and jobs are classified, job priorities are calculated by combining the characteristics of the jobs and the attributes of submitted users, the resources of the same type are matched with the nodes when the resources are distributed, and the node labels are dynamically adjusted by adopting a self-detection method in consideration of the relationship between the performance of the nodes and the task amount running at the current stage. And finally, performing comparative analysis on the performance of the algorithm through experiments.
2. Aiming at the problem of data locality, the invention provides a delay degradation algorithm, the degradation is divided into three types, namely the current local node, the local frame node and the random node, and the data locality is improved by reducing the locality level within a certain delay time.
3. The invention adopts a dynamic label method, firstly runs different types of jobs in advance, classifies comparison nodes according to the running time of a single node and the average time of all nodes of a cluster, and then carries out self-detection on the node performance according to the load condition of a cluster node running task and generates a corresponding new label.
4. The invention proposes to classify the jobs, and since the Map parts of the MapReduce job are the same processing logic, the jobs can be classified according to the partial information executed in advance by the jobs.
Drawings
FIG. 1 is a flowchart of an overall framework for job scheduling;
FIG. 2 is a flow chart of a scheduling algorithm;
FIG. 3 is a comparison graph of the total running time of three jobs under different scheduling algorithms;
FIG. 4 is a graph of Container distribution quantity under 500M data quantity under DLMS;
FIG. 5 is a graph of Container distribution quantity under 1G data quantity under DLMS;
FIG. 6 is a graph of Container distribution quantity under 1.5G data quantity under DLMS;
FIG. 7 is a run time comparison graph of a job group under different scheduling algorithms;
Detailed Description
In order to make the objects, technical solutions and features of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. The YARN scheduling framework is shown in fig. 1.
The individual steps are explained below:
(1) the user submits an application program to the YARN, wherein the application program comprises a user program and a command for starting the ApplicationMaster.
(2) ResourceManager assigns a first Container to the application and communicates with the corresponding NodeManager, asking it to start the ApplicationMaster of the application.
(3) After the applicationMaster registers to the Resourcemanager, the resource is applied for each task, and the running state of the tasks is monitored until the running is finished
(4) Before sending the heartbeat, the NodeManager performs self-detection to generate a dynamic node label and reports resources to the ResourceManager.
(5) And the tasks are classified into different label queues, and priority ordering is carried out to wait for resource allocation.
(6) The ApplicationMaster applies for and obtains resources from ResourceManager via RPC protocol.
(7) And according to the node labels and the resources reported by the NodeManager, the scheduler allocates the resources of the node to the jobs of the corresponding label queue.
(8) After the ApplicationMaster applies for the resource, the ApplicationMaster communicates with the corresponding NodeManager to request the ApplicationMaster to start the task.
(9) After the NodeManager sets an operating environment (environment variables, JAR packages, binary programs and the like) for the task, a task starting command is written into the script, and the task is started by operating the script.
(10) Each task reports the state and the progress of the task to the applicationMaster through a certain RPC protocol, and the task can be restarted when the task fails.
(11) After the application runs, the ApplicationMaster logs out to the ResourceManager and closes itself.
Firstly, initially classifying cluster physical nodes, wherein the classification method comprises the following steps:
(1) set cluster machine node set as N ═ Ni|i∈[1,n]N is the total number of nodes, i is a positive integer N starting from 1, NiRepresenting the ith physical machine in the cluster.
(2) Executing a CPU, IO and common type operation with the same task amount on each node and recording operation execution time;Tcpu(i) is shown in the NiThe time it takes to execute CPU jobs on individual nodes; t isio(i) Is shown in the NiTime taken to execute IO jobs on individual nodes, Tcom(i) Is shown in the NiIt takes time to execute a normal job on an individual node.
(3) Calculating the cluster average time of each job, wherein the calculation formula of the cluster average time is as follows:
Figure GDA0002419301300000041
Figure GDA0002419301300000042
j represents the type of job, and the time difference between each node and the average time under the job is calculated if Tcpu(i)<AvgcpuThe node is labeled with the original label of the CPU type node if Tcpu(i)>AvgcpuAnd attaching a common original label to the node, comparing to obtain a plurality of labels on each node, and selecting the label with the most time saving as the last label of the node.
Let M be the operation information of Map, which includes the following information M ═ MIn, { MOut, Rate, act, Mcpu, Zcpu, Mrate } MIn, which needs to be collected, to represent Map input data amount, MOut to represent Map output data amount, Rate to represent input/output data amount, act to represent CPU average usage Rate, Mcpu to represent the median of CPU, Zcpu to represent the average of CPU usage Rate over 90%, and Mrate to represent memory usage amount, which will become the characteristic attribute of this job classification later. In the experimental process, the characteristics of the operation cannot be well reflected by simply calculating the average time of the CPU, the times that the CPU utilization rate of the CPU type operation is more than 90% are found through experiments, and the times that the CPU utilization rate of other types of operations is more than 90% are relatively less through experiments, so that the information is also added into the information returned by the map.
The method comprises the steps of adopting a user-defined double-layer weight design method on the queue priority, setting the weight occupied by a job size attribute as worthwum, dividing the attribute into three levels num, middle and short, dividing the weight occupied by a job owner attribute as worthwhile, dividing the attribute into two levels user, belonging to root, taking the weight occupied by the job emergency degree as worthwogence, dividing the attribute into three levels pro, LowProity, giving the waiting time as worthWait, giving the waiting calculation formula as waitTime-nowTime, calculating the priority number of each task, and sequencing in a corresponding queue, wherein the sum of the four task attributes is 100%, and the specific weight formula is as follows.
worthNum+worthUser+worthEmogence+worthWtait=100%;
And finally, a weight calculation formula:
finalWort=worthNum*num+worthUser*user+worthEmogence*prority+worthWait*waitTime
in the aspect of operation classification, a naive Bayes classifier is adopted, and the specific classification steps are as follows:
(1) respectively calculating the conditional probability of a job being CPU, IO or common type job under certain conditions:
P(job=cpu|V1,V2...Vn)
P(job=io|V1,V2...Vn)
P(job=com|V1,V2...Vn)
wherein jobe ∈ { cpu, io, com } represents a job category label; viIs an attribute feature of the job.
(2) According to Bayesian formula P (B | A) ═ P (AB)/P (A):
Figure GDA0002419301300000051
suppose ViAre relatively independent of each other according to independent assumptions therein
Figure GDA0002419301300000052
(3) P (V) in actual calculation1,V2,…,Vn) Is negligible regardless of operationDisregarding it, thus obtaining it
Figure GDA0002419301300000053
For the same reason have
Figure GDA0002419301300000054
Figure GDA0002419301300000055
Whether the job is a CPU type job, an IO type job, or a normal type job depends on which probability value is larger.
Locality documents adopt a delayed degradation scheduling strategy. The specific idea of the strategy is as follows:
adding a delay time attribute to each job, and setting TiFor the current delay time of the ith job, i ∈ [1, n ∈]N is the number of nodes in the cluster, TlocalIndicating local node delay time threshold, TrackRepresenting a chassis node delay time threshold. When the scheduler allocates resources to a job, if the execution node and the data input node of the job are not on one node, T is the timeiIncrementing by 1, which means that the job has a delayed schedule, this resource is allocated to other appropriate jobs until T is reachedi>TlocalWhen the operation is not available, the local property of the operation is reduced to the local property of the rack, and the node in the rack can allocate resources to the operation; when T isi>TrackWhen the job is executed, the locality of the job is reduced to a random node. Wherein T islocalAnd TrackThe configuration file mode is adopted, and the user configures the configuration file according to the cluster condition. And a delayed scheduling strategy can ensure that better locality can be obtained within a certain delay time.
The basic idea of the DLMS scheduling method is to pre-allocate part of job execution, classify the job according to the information returned by the job, and then allocate the resource of the node tag to the task in the corresponding queue, and the basic process is as follows:
step 1, when a node reports resources to resource management through heartbeat, if an original queue is not empty, the operation in the original queue is traversed, the operation of which the operation type label is appointed in a command line or a program is distributed to a corresponding label priority queue, and the operation is removed by the original queue.
And step 2, scheduling the jobs without the specified job type labels in the original queue to a waiting queue.
And 3, if the waiting priority queue is not empty, classifying the jobs in the waiting priority queue into a corresponding label priority queue.
And 4, if the job type queue corresponding to the node performance label is not empty, allocating the resource of the node to the queue, and ending the allocation.
And 5, setting a variable for checking the resource access times, if the number of the resource access times exceeds the number of the clusters, distributing the resources of the nodes to corresponding queues according to the sequence of CPU, IO, common and waiting priorities, and finishing the scheduling. The step can prevent the situation that the CPU type node resources are exhausted due to excessive operation of the CPU queue, the nodes of other labels have the resources, but the operation can not distribute the resources. The flow chart of the algorithm is shown in fig. 2.
Experimental Environment
This section will verify experimentally the actual effect of the DLMS scheduler presented herein. The experimental environment is a Hadoop completely distributed cluster constructed by 5 PCs, and node machines of the cluster are uniformly configured into an operating system Ubuntu-12.04.1, JDK1.6, Hadoop2.5.1, a memory 2G and a hard disk 50G. Wherein the number of CPU cores of the NameNode is 2, the number of CPU cores of the dataNode1 is 2, the number of CPU cores of the dataNode2 is 4, the number of CPU cores of the dataNode3 is 2, and the number of CPU cores of the dataNode4 is 4.
Results and description of the experiments
First, a wordCount (IO type) with a data size of 128M and a kmean (CPU type) job are prepared, and the jobs are run on 4 nodes for 6 times, respectively, and the time of the job run is recorded. In table 1, s represents a unit of time, avg represents an average time of the node running the corresponding tag task, avavg represents a total average time of all nodes running the corresponding tag task, and a calculation formula of rate is as follows:
Figure GDA0002419301300000061
the negative sign indicates a decrease in the average time relative to the total average time, and the positive sign indicates an increase in the average time relative to the total average time.
From table 1, it can be seen that the time of the DataNode1 in running two tasks is time saving, we take the most CPU jobs saved as the original label of the machine, the DataNode2 is the IO label, and the DataNode3 and DataNode4 are the normal machines.
Table 1 original classification experimental table
Figure GDA0002419301300000062
Results of the experiments and analysis thereof
The method uses several kinds of operation which can obviously distinguish operation types, WordCount needs a large amount of read data and write intermediate data in the Map stage, and the Map stage and the Reduce stage basically have no arithmetic calculation, so the operation is characterized as IO type operation, Kmeans needs a large amount of calculation points and distances between the points in the Map stage and the Reduce stage, and does not have too much intermediate data writing, so the operation is characterized as CPU type operation, TopK does not have a large amount of data to be written into a disk in the Reduce stage, and does not have a large amount of calculation, and only involves simple comparison, so the operation is considered as a common task artificially.
The verification is carried out through two groups of experiments, the first group of experiments sets a scheduler as fifo, WordCount, Kmeans and Topk jobs are respectively operated for 3 times under the data volume of 500M, 1G and 1.5G, the average time of 3 times of each job is recorded as the final time, the scheduler is switched to carry out the same experiment operation for Capacity and DLMS schedulers, the distribution of a Container of each job under the DLMS scheduler in a cluster is recorded in the experiments, the Container is a dividing unit representing cluster resources, the distribution condition of job fragments in the cluster is recorded, and each Map and Reduce process in the YARN is represented by one Container. The Container distributes the proportion of each node in the cluster to indicate the proportion of the amount of the task of the node executing the job. The abscissa of fig. 3 is the data size of the job, and the ordinate is the total time for which 3 jobs WordCount, Kmeans, Topk are run together. In case of an increased amount of data, the DLMS scheduler saves about 10-20% of the time compared to other schedulers.
Since the DLMS will allocate the resources of the respective node label to the job of the respective label. The Map and reduce of the job are run on the node in the form of one Container, and fig. 3 to 5 are the Container numbers of the jobs of different data volumes under the DMLS scheduler. Node1 is a CPU type tag Node according to the original classification in the previous section, Node2 and Node3 are common tag nodes, and Node4 is an IO tag Node. WordCount is an IO type job, Topk is a normal type job, and Kmeans is a CPU type job. It can be seen from the figure that the distribution rule of the containers is that WordCount work distributes more containers on Node4, Tokp distributes more on common nodes Node2 and Node3, and Kmeans work distributes more on Node 1. The distribution of the containers of the different jobs on the cluster nodes shows that the DLMS scheduler improves the probability that the resources of the corresponding node labels are allocated to the corresponding label jobs.
In the second set of experiments, 5 jobs were prepared, i.e., WordCount job for 128M and 500M data size, Kmeans job for 128M and 500M data size, and Topk job for 500M to constitute one job group. 5 jobs are submitted to run simultaneously. And simulating the continuous job execution condition in the clusters of different schedulers, and recording the total time of the job group after being executed. The team was run 3 times under different schedulers and the total time the team ran was recorded. With specific results in FIG. 6, it is apparent from FIG. 6 that the time savings of the DLMS scheduler proposed herein over the Hadoop self-contained scheduler executing the same job set is apparent, the DMLS scheduler proposed herein saves about 20% of the time over the Hadoop self-contained Fifo scheduler and about 10% of the runtime over the Capacity scheduler.

Claims (1)

  1. The DLMS scheduling method for dynamic label matching under the Hadoop platform is characterized by comprising the following steps:
    original classification of cluster nodes and dynamic classification labels thereof;
    firstly, original classification is required to be carried out on cluster nodes, and classification is carried out according to the performance of CPUs (central processing units) and disks IO (input/output) of the nodes; each node in the cluster needs to independently run a task of a specified type and record the task time of the node for running the specified type, and the nodes are divided into CPU type nodes, disk IO type nodes and normal type nodes according to the size relation between the task time of the node for running the specified type and the average running time of all the nodes in the cluster;
    in the running process of the cluster nodes, if the load is overlarge due to the fact that one node runs part of tasks, the label of the node is degraded to be directly degraded to be a common node; the node initial label is a CPU type label, a CPU type task runs in the node, a dynamic label method is adopted, when the NodeManager sends heartbeat to a ResourceManager, the CPU and IO utilization rate of a node machine are dynamically detected, if the utilization rate exceeds a threshold value, the node label is pasted with a common label, and detection is needed once when the heartbeat is sent each time, so that the node dynamic label is realized; the threshold value is configured in a configuration file, and if the user is not configured, the default value of the system is referred to;
    (1) obtaining and transmitting Map process running information
    Collecting the running information of a first Map process of task running, wherein the running information of the first Map process is transmitted to a scheduler when the NodeManager sends a heartbeat to a ResourceManager, and the scheduler classifies the tasks according to the transmitted running information of the first Map process;
    if the user knows the label of the task, a task type label is set for the task in a command line or a code, a scheduler checks during scheduling, and if the user labels the task, a task classification link is omitted and the task is directly scheduled;
    (2) multi-priority queue
    5 queues are newly built in the scheduler: the system comprises an original queue, a waiting priority queue, a CPU priority queue, an IO priority queue and a common priority queue; the method comprises the steps that a user submits a task, firstly, the task enters an original queue, a part of Map processes of the task are operated, Map process operation information is collected, then, the task enters a waiting priority queue to wait for the Map process operation information to be returned and classified, and finally, the task enters a queue corresponding to a label according to the classification category of the task;
    (3) task classification
    Preprocessing data before classification, wherein the data preprocessing is to linearly transform variable data to a new scale in the aspect of data normalization, the minimum value of the transformed variable is 0, the maximum value of the transformed variable is 1, and all the variable data are ensured to be less than or equal to 1;
    selecting a naive Bayes classifier for classification in the aspect of task classification; if the user adds the type of the task in the command line and the task code, the task classification can be omitted, and the task directly enters a corresponding queue to wait for resource allocation;
    (4) data locality
    The data locally adopts a delay degradation scheduling strategy.
CN201710181055.0A 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform Expired - Fee Related CN107038069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710181055.0A CN107038069B (en) 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710181055.0A CN107038069B (en) 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform

Publications (2)

Publication Number Publication Date
CN107038069A CN107038069A (en) 2017-08-11
CN107038069B true CN107038069B (en) 2020-05-08

Family

ID=59534217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710181055.0A Expired - Fee Related CN107038069B (en) 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform

Country Status (1)

Country Link
CN (1) CN107038069B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766150A (en) * 2017-09-20 2018-03-06 电子科技大学 A kind of job scheduling algorithm based on hadoop
CN108052443A (en) * 2017-10-30 2018-05-18 北京奇虎科技有限公司 A kind of test assignment dispatching method, device, server and storage medium
CN107832153B (en) * 2017-11-14 2020-12-29 北京科技大学 Hadoop cluster resource self-adaptive allocation method
CN107832134B (en) * 2017-11-24 2021-07-20 平安科技(深圳)有限公司 Multitasking method, application server and storage medium
CN108509280B (en) * 2018-04-23 2022-05-31 南京大学 Distributed computing cluster locality scheduling method based on push model
CN110532085B (en) * 2018-05-23 2022-11-04 阿里巴巴集团控股有限公司 Scheduling method and scheduling server
CN108959580A (en) * 2018-07-06 2018-12-07 深圳市彬讯科技有限公司 A kind of optimization method and system of label data
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device
CN109656699A (en) * 2018-12-14 2019-04-19 平安医疗健康管理股份有限公司 Distributed computing method, device, system, equipment and readable storage medium storing program for executing
CN111930493B (en) * 2019-05-13 2023-08-01 中国移动通信集团湖北有限公司 NodeManager state management method and device in cluster and computing equipment
CN110278257A (en) * 2019-06-13 2019-09-24 中信银行股份有限公司 A kind of method of mobilism configuration distributed type assemblies node label
CN111124765A (en) * 2019-12-06 2020-05-08 中盈优创资讯科技有限公司 Big data cluster task scheduling method and system based on node labels
CN112039709B (en) * 2020-09-02 2022-01-25 北京首都在线科技股份有限公司 Resource scheduling method, device, equipment and computer readable storage medium
CN112445925B (en) * 2020-11-24 2022-08-26 浙江大华技术股份有限公司 Clustering archiving method, device, equipment and computer storage medium
CN113590294B (en) * 2021-07-30 2023-11-17 北京睿芯高通量科技有限公司 Self-adaptive and rule-guided distributed scheduling method
CN115904645A (en) * 2021-09-30 2023-04-04 华为技术有限公司 Method, apparatus, device and medium for task scheduling
WO2023056618A1 (en) * 2021-10-09 2023-04-13 国云科技股份有限公司 Cross-cloud platform resource scheduling method and apparatus, terminal device, and storage medium
CN114064294B (en) * 2021-11-29 2022-10-04 郑州轻工业大学 Dynamic resource allocation method and system in mobile edge computing environment
CN114840343A (en) * 2022-05-16 2022-08-02 江苏安超云软件有限公司 Task scheduling method and system based on distributed system
CN117056061B (en) * 2023-10-13 2024-01-09 浙江远算科技有限公司 Cross-supercomputer task scheduling method and system based on container distribution mechanism

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915407A (en) * 2015-06-03 2015-09-16 华中科技大学 Resource scheduling method under Hadoop-based multi-job environment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756595B2 (en) * 2011-07-28 2014-06-17 Yahoo! Inc. Method and system for distributed application stack deployment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915407A (en) * 2015-06-03 2015-09-16 华中科技大学 Resource scheduling method under Hadoop-based multi-job environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"云计算平台作业调度算法优化研究";徐鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140815;第I138-40页 *

Also Published As

Publication number Publication date
CN107038069A (en) 2017-08-11

Similar Documents

Publication Publication Date Title
CN107038069B (en) Dynamic label matching DLMS scheduling method under Hadoop platform
US20170255496A1 (en) Method for scheduling data flow task and apparatus
US8812639B2 (en) Job managing device, job managing method and job managing program
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
US9542223B2 (en) Scheduling jobs in a cluster by constructing multiple subclusters based on entry and exit rules
US9092266B2 (en) Scalable scheduling for distributed data processing
WO2011076608A2 (en) Goal oriented performance management of workload utilizing accelerators
US20060195845A1 (en) System and method for scheduling executables
Pakize A comprehensive view of Hadoop MapReduce scheduling algorithms
CN108509280B (en) Distributed computing cluster locality scheduling method based on push model
CN110990154B (en) Big data application optimization method, device and storage medium
Pongsakorn et al. Container rebalancing: Towards proactive linux containers placement optimization in a data center
Ahmed et al. A hybrid and optimized resource scheduling technique using map reduce for larger instruction sets
CN113946431B (en) Resource scheduling method, system, medium and computing device
CN110084507B (en) Scientific workflow scheduling optimization method based on hierarchical perception in cloud computing environment
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
KR101640231B1 (en) Cloud Driving Method for supporting auto-scaled Hadoop Distributed Parallel Processing System
CN113255165A (en) Experimental scheme parallel deduction system based on dynamic task allocation
Li et al. On scheduling of high-throughput scientific workflows under budget constraints in multi-cloud environments
CN115827237A (en) Storm task scheduling method based on cost performance
Khalil et al. Survey of Apache Spark optimized job scheduling in Big Data
CN112783651B (en) Load balancing scheduling method, medium and device for vGPU of cloud platform
CN111522637B (en) Method for scheduling storm task based on cost effectiveness
CN116932156A (en) Task processing method, device and system
Seethalakshmi et al. Job scheduling in big data-a survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200508