WO2018157768A1 - 调度运行设备的方法、设备和运行设备 - Google Patents

调度运行设备的方法、设备和运行设备 Download PDF

Info

Publication number
WO2018157768A1
WO2018157768A1 PCT/CN2018/077191 CN2018077191W WO2018157768A1 WO 2018157768 A1 WO2018157768 A1 WO 2018157768A1 CN 2018077191 W CN2018077191 W CN 2018077191W WO 2018157768 A1 WO2018157768 A1 WO 2018157768A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
control device
partition
running
test
Prior art date
Application number
PCT/CN2018/077191
Other languages
English (en)
French (fr)
Inventor
朱韧
周伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018157768A1 publication Critical patent/WO2018157768A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of communications, and more particularly to a method, device and operating device for scheduling a running device.
  • IT Information Technology
  • the cloud computing can access the computing resource sharing pool in a convenient and on-demand manner.
  • the resources included in the computer resource sharing pool may be referred to as system resources, including networks, servers, storage, application software, and services.
  • the existing resource scheduling scheme centrally manages system resources and job scheduling, and cannot detect the resource usage of the entire system in real time, and cannot obtain a more optimized scheduling scheme.
  • the embodiment of the present invention provides a method, a device, and a running device for scheduling a running device, which implements a resource of a partition management system, can detect resource usage of the entire system in real time, perform reasonable scheduling on the running device, and utilize system resources reasonably.
  • a first aspect provides a method for scheduling a running device, including: sending a test task to a control device of a plurality of partitions in a cluster, each of the plurality of partitions including at least one running device; acquiring the plurality of partitions Controlling a test result of the test task sent by the device; according to the test result, selecting a first partition for executing the first task from the plurality of partitions in the cluster, the first task is an executed task; and the first partition is The control device sends the first task, so that the control device selects a target running device that performs the first task from the first partition.
  • the central control device and the control devices of different zones may be the same device.
  • the central control device sends a test task to the control devices of the multiple partitions in the cluster, so that the running device is reasonably scheduled in units of partitions, and the central control device passes the test result of the test task from the cluster.
  • the partition for executing the first task is selected, and the appropriate partition for performing the first task is selected, and the system resources in the cluster are rationally utilized.
  • the sending, by the control device of the multiple partitions in the cluster, the test task includes: packaging the first task as the test task; sending the control device to each of the partitions The test task.
  • the central control device and/or the control device of each of the partitions may belong to the cluster.
  • the central control device and/or the control device of each of the partitions may not belong to the cluster.
  • the central control device may randomly select to send the encapsulated first task to the control device of the at least one partition within the plurality of partitions.
  • the test result includes: an operating state parameter value during the first task testing, a maximum interference strength that the first task can withstand, and the first task generates the operating device At least one of the interference intensities.
  • the running status parameter of the first task test includes one or more of the following information, such as a delay, a query rate per second, a response time, and a throughput rate.
  • the central control device selects one or more operational status parameters affecting the first task as an indicator of selecting a first partition to perform the first task.
  • the central control device selects to perform the first partition of the first task according to the matching degree of the maximum interference strength that the first task can withstand and the interference strength of each partition.
  • the central control device selects a first partition that performs the first task according to the interference intensity generated by the first task to the running device and the interference intensity of each partition.
  • the central control device may further perform, by using a weighted averaging method, a plurality of parameter values of the performance test, such as a delay, a query rate per second, a response time, a throughput rate, and a maximum interference strength that the first task can withstand.
  • the weighted average is weighted average to select the first partition of the running device.
  • the central control device selects a partition for performing the first task from the plurality of partitions according to the parameter value obtained by performing performance test on the first task for each of the plurality of partitions, and can perform execution more accurately.
  • the partitioning of the first task so that the system resources are rationally utilized.
  • the method before the acquiring the first task, the method further includes: before the sending the test task to the control device of the multiple partitions in the cluster, the method further includes:
  • resource information and/or interference information of a plurality of running devices in the cluster where the resource information is used to indicate a resource that can be used in the running device, where the interference information includes The intensity of the interference generated by the task on the operating device; dividing the plurality of operating devices into the plurality of partitions according to resource information and/or interference information of the plurality of operating devices; assigning each of the partitions controlling device.
  • the case of the resources that can be used in the running device includes the usage rate of the resource, the remaining rate of the resource, the resource that has been used, or the resource that can be used.
  • the interference intensity generated by the first task on the running device to the running device includes an interference intensity generated by the task that has been executed or is being executed on the running device.
  • the central control device and/or the control device of each partition may belong to the cluster.
  • the central control device and/or the control device of each partition may not belong to the cluster.
  • the central control device divides the partitions for multiple operating devices, and allocates control devices for each partition, thereby realizing resource management in units of partitions, and the central device no longer centrally manages system resources and task scheduling, and solves the problem.
  • system resources are expanded, central control equipment becomes a system bottleneck.
  • the method when the target running device completes the first task, the method further includes: the central control device acquiring updated resource information and/or interference information of the target running device And updating the partition to which the target running device belongs from the first partition to the second partition according to the relationship between the updated resource information and/or the interference information sent by the target running device and the plurality of partition range values.
  • the central control device receives the updated resource information sent by the running device.
  • the running device sends the updated resource information to the control device of the first partition, and the control device of the first partition sends the updated resource information to the central control device.
  • updating the partition to which the target running device belongs from the first partition to the second partition includes: the central control device sending the first indication information to the control device of the first partition The first indication information is used to indicate that the control device of the first partition deletes the information for executing the target running device; the central control device sends second indication information to the target running device, where the second indication information is used to indicate the The target running device changes the partition to which the running device changes from the first partition to the second partition.
  • the central control device sends the first indication information to the target running device, where the first indication information is used to indicate that the target operating device changes the belonging device from the first partition to the second partition;
  • the central control device sends second indication information to the control device of the first partition, where the second indication information is used to indicate that the control device of the first partition deletes the information for executing the target running device.
  • the central control device sends the first indication information to the control device of the first partition, where the first indication information is used to indicate that the control device of the first partition deletes the information about executing the target operating device;
  • the central control device After the central control device receives the second indication information sent by the control device of the first partition, the central control device sends third indication information to the target running device, where the second indication is used to indicate the control device of the first partition.
  • the information of the running device has been deleted, and the third indication information is used to indicate that the target running device changes the running partition from the first partition to the second partition.
  • the central control device can detect the usage of the entire system resource in real time by acquiring the updated resource information of the running device, and the central control device re-divides the running device according to the updated resource information and/or interference information. Partitioning, thus realizing the dynamic division of partitions, reasonably scheduling the running equipment, and rationally using system resources.
  • the resource of the running device includes at least one of the following information: storage information of the running device; central processor information of the running device; network information of the running device The heterogeneous acceleration information of the running device and the interference information of the running device.
  • the dividing the multiple running devices into the multiple partitions according to the resource information and/or the interference information of the multiple running devices including: according to the multiple operating devices The relationship between the resource information parameter value of each running device and the value range of multiple partitions, and each running device is divided into corresponding partitions.
  • each running device is divided into corresponding partitions according to a relationship between an information parameter value of each running device and a plurality of partition value ranges.
  • the central control device partitions the plurality of operating devices in the cluster according to the relationship between the resource information and/or the interference information of the plurality of operating devices and the plurality of partition value ranges, thereby forming a plurality of partitions, thereby
  • the system is divided into units for system resource management and job scheduling, and the running equipment is reasonably scheduled to utilize system resources reasonably.
  • the method before the sending the test task to the control device of the multiple partitions in the cluster, the method further includes:
  • the sending the first task to the control device controlling the first partition comprising: controlling the first computing framework by calling the first computing framework
  • the control device of the first partition sends the first task.
  • a second aspect provides a method for scheduling a running device, comprising: receiving, by a control device, a test task sent by a central control device, the control device configured to manage at least one operation in a first one of the plurality of partitions in the cluster The device selects at least one running device in the partition to perform the test of the test task; receives the test parameter value of the test task sent by the running device that performs the test of the test task; and sends the test task test to the central control device a parameter value for the central control device to determine the first partition of the first task.
  • the control device receives the test task sent by the central control device and selects at least one running device in the partition to perform performance test of the test task, and receives the running device that performs the performance test of the test task.
  • Reasonable use reasonable scheduling of operating equipment.
  • control device receives a first task sent by the central control device, selects a target running device that performs the first task in the first partition, and schedules the target running device. Performing the first task.
  • selecting the target running device that performs the first task in the first partition including: testing parameter values of the at least one running device that is tested on the first task, The at least one running device that the first task tests is selected to select the target running device.
  • control device may arbitrarily select a running device to execute the first task in the first partition.
  • the method further includes: receiving, by the central control device, first indication information, where the first indication information is used to indicate that the control device deletes the running of the first task Information of the device; sending, to the central control device, second indication information, the second indication information is used to indicate that the control device has deleted the message of the target running device.
  • the third aspect provides a method for scheduling a running device, including: a running device receiving a test task sent by a control device, where the control device is configured to manage a first partition where the running device is located, where the first partition includes at least a running device;
  • the running device sends the test result to the control device, where the control device sends the test result to the central control device, so that the central control device selects to perform the first A partition of a task that executes a first task sent by the control device. Therefore, the central control device performs reasonable scheduling on the operating device, and the operating device is reasonably selected for the first task, and the computer resources are rationally utilized.
  • the running device receives the first task sent by the control device, and performs the first task.
  • the method before the running device performs the first task sent by the control device, the method further includes: sending the running device to the central control device or the control device to which the running device belongs Resource information, used by the central control device to partition the running device.
  • the method further includes: receiving the first indication information sent by the central control device or the control device to which the running device belongs, where the first indication information is used to indicate that the running device is to be The partition to which the running device belongs is updated from the first partition to the second partition; and the resource information of the running device is sent to the control device of the second partition.
  • a fourth aspect provides a system for scheduling a running device, the system comprising a central control device, a plurality of control devices, and a plurality of operating devices, wherein the plurality of operating devices are divided into a plurality of partitions, wherein the plurality of partitions are Each of the plurality of partitions includes at least one running device, and each of the plurality of control devices controls one of the plurality of partitions;
  • the central control device is configured to: send a test task to the plurality of control devices; and receive the test result sent by the multiple control devices, and select, according to the test result, a first task for performing a task from the plurality of partitions Partitioning and sending the task to the control device of the first partition.
  • the control device is configured to: receive the test task sent by the central control device, and send the test task to at least part of the running device in the controlled partition; receive the controlled test result sent by the at least part of the running device, to Receiving, by the central control device, the test result; receiving the task sent by the central control device, selecting a target running device that performs the task in the controlled partition, and scheduling the target running device in the controlled controlled area to perform the task;
  • the running device is configured to: receive the test task sent by the control device, test the test task, obtain the test result of the test task, and send the test result to the respective control device; and execute according to the scheduling of the control device The mission.
  • the central control device selects a partition for executing the first task from a plurality of partitions in the cluster by using the test result of the test task, and selects a reasonable partition for performing the first task.
  • the system resources in the cluster are rationally utilized, and the running equipment is reasonably scheduled.
  • the central control device is further configured to: encapsulate the task, to obtain the test task.
  • the test result includes: at least a running state parameter value at the time of the task test, a maximum interference strength that the task can withstand, and an interference strength generated by the task on the running device.
  • the central control device is further configured to: obtain resource information and/or interference information of the multiple running devices, where the resource information is used to indicate that the operating device is usable.
  • the interference information includes an interference strength generated by a task on the running device to the running device; and the plurality of operating devices are divided into the multiple partitions according to resource information and/or interference information of the multiple running devices. From among the plurality of control devices, a control device is assigned to each of the plurality of control devices.
  • the running device is further configured to: send the updated resource information and/or interference information of the running device to the central control device or the control device to which the running device belongs;
  • the control device is further configured to: receive the updated resource information and/or interference information of the running device sent by the running device, and send the updated resource information and/or interference information of the running device to the central control device;
  • the central control device is further configured to: receive the updated resource information and/or interference information of the target running device sent by the operating device or the control device to which the running device belongs; according to the updated resource information and/or interference information Update the partition of the running device by the relationship of multiple partitions.
  • a central control device for performing the method of any of the above first aspect or the first aspect of the first aspect.
  • the central control device comprises a modular unit for performing the method of any of the above-described first aspect or any of the possible implementations of the first aspect.
  • a control device for performing the method of any of the possible implementations of the second aspect or the second aspect above.
  • the control device comprises a modular unit for performing the method of any of the possible implementations of the second aspect or the second aspect described above.
  • an operating device for performing the method of any of the above-described third or third possible implementations.
  • the operating device comprises a modular unit for performing the method of any of the possible implementations of the third aspect or the third aspect described above.
  • a central control device for performing the method of any of the first aspect or the first aspect of the first aspect, the central control device comprising a processor, a memory, and a transceiver, the process The apparatus is operative to invoke instructions stored in the memory to perform the method of the first aspect described above or any alternative implementation thereof.
  • a ninth aspect a control device, comprising the method of any one of the foregoing second aspect or the second aspect, wherein the central control device comprises a processor, a memory, and a transceiver, the processor The method of the second aspect or any alternative implementation thereof is performed by invoking instructions stored in the memory.
  • a tenth aspect a method for performing the method of any one of the foregoing third aspect or the third aspect, wherein the central control device comprises a processor, a memory, and a transceiver, the processor The method of the third aspect or any alternative implementation thereof is executed by invoking instructions stored in the memory.
  • a computer readable medium for storing a computer program, the computer program comprising any one of the possible implementations of the first aspect or the first aspect, the second aspect or the second Any of the possible implementations of the aspect and the instructions of the method of any of the third or third possible implementations.
  • FIG. 1 is a schematic diagram of a trunking communication system in accordance with an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for running device scheduling according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of partitioning a running device partition according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for running device scheduling according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of partition dynamic division according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a central control device in accordance with an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a central control device in accordance with an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a control device according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a running device in accordance with an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a communication device according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a trunking communication system according to an embodiment of the present application.
  • the system 100 includes a central control device, a control device, and an operating device.
  • the central control device comprises a central control device 101 comprising a control device 110 and a control device 111, the operating device comprising an operating device 120, an operating device 121, an operating device 122 and an operating device 123.
  • All the running devices form a cluster, and all the resources of the running device become a resource sharing pool.
  • the resources of the resource sharing pool include the central processor resources of each running device, the disk array resources of each running device, and the solid state hard disk of each running device. (Solid State Drives, SSD) storage resources, network resources for each running device, and heterogeneous acceleration resources for each running device.
  • SSD Solid State Drives
  • network resources include network topology and network bandwidth
  • heterogeneous acceleration resources include resources such as GPU, GPGPU, GPDSP, ASIC, FPGA, and other types of many-core processors.
  • the central control device and/or the control device may also be within the cluster, ie, the operating device within the cluster may be selected as the central control device of the entire cluster or the control device of the partition.
  • the running device 120 and the running device 121 belong to the same partition, and the control device 110 is configured to perform resource management and task scheduling on the partition.
  • the running device 122 and the running device 123 belong to the same partition, and the control device 111 is used for resource management and task scheduling of the partition.
  • the central control device 101 is configured to select a partition for the task and send the task to the control device of the selected partition, the control device of the partition scheduling the running device within the partition to perform the task.
  • the partition refers to a logical area including some running devices in the cluster, the usage of the resources of the plurality of running devices in the same partition is in the same range and/or the tasks on the plurality of running devices in the same partition are The interference intensity generated by the operating equipment is in the same range.
  • the central control device carries various computing frameworks, such as Hadoop, Spark, MPI, and Storm.
  • Hadoop is a computational framework for distributed processing of data on a computer for offline high-volume data processing.
  • Spark is a parallel computing framework based on in-memory computing that puts data in memory as much as possible to improve iterative applications and interactive The computational efficiency of the application cannot be used to process data that needs to be preserved for a long time.
  • MPI is a parallel computing framework based on message passing. It is suitable for parallel computing of various complex applications and supports multi-program and multi-data.
  • Storm is an online real-time processing.
  • the calculation framework does not collect and store data, and directly receives data in real time through the network and processes the data in real time.
  • the user submits the task through the interface on the computing framework, and the central control device starts the corresponding computing framework according to different task types.
  • Various computing frameworks are used to manage tasks and send the tasks to the control devices of the partition.
  • the running device may be a physical server, or may be a virtual machine and a container.
  • the central control device can also manage other control devices, and each control device can schedule not only two operating devices, but also one or more than three operating devices.
  • FIG. 2 is a schematic flowchart of a method 200 for running device scheduling according to an embodiment of the present application.
  • 2 shows two control devices, namely a control device 1 and a control device 2, respectively.
  • the control device 1 controls the operating device 11, which controls the operating device 21, which is for convenience of description and should not be implemented by the present application.
  • the example constitutes a special limitation.
  • the central control device can also manage other control devices, and each control device can control not only one operating device but also only a plurality of operating devices.
  • the method 200 includes the following.
  • the test task is sent to a plurality of partitioned control devices in the cluster, the test task being a test task of the first task, and each of the plurality of partitions includes at least one running device.
  • the central control device transmits the test task to the control device 1 and the control device 2.
  • the method before sending the test task to the control devices of the plurality of partitions in the cluster, the method further includes acquiring the first task.
  • the user submits the first task on the central control device.
  • the user submits the first task on the client, and the client sends the first task to the central control device.
  • the first task includes a long task, a batch computing task, and the like.
  • a long task is a task that runs on the platform for a long time, such as a WEB service program;
  • a batch task is a task that performs a large amount of calculations at a time but has a short time, such as Hadoop's big data processing.
  • the central control device may have partitioned multiple operating devices within the cluster before the central control device obtains the first task. Specifically, as shown in FIG.
  • FIG. 3 is a schematic diagram of partitioning of a divided running device of a method for running device scheduling according to an embodiment of the present application.
  • the central control device acquires resource information and/or interference information of multiple running devices in the cluster, partitions the running devices in the cluster, and allocates control devices of each partition.
  • the running device 0, the running device 1 and the running device 2 belong to the partition 1, the control device 1 manages the partition 1; the operating device 3, the operating device 4 and the operating device 5 are the partition 2, and the control device 2 manages the partition 2;
  • the running device 7, the running device 8 and the running device 9 are partitions 3, and the control device 3 manages the partition 3.
  • the plurality of running devices in the cluster may collect resource information and/or interference information of the running device through the proxy plug-in, and send resource information and/or interference information of the running device to the central control device.
  • the resource information is used to indicate a resource that can be used in the running device
  • the interference information includes an interference strength generated by a task on the running device to the running device.
  • the available resources of the running device mentioned in the embodiment of the present application include a central processor resource of the running device, a disk array resource of the running device, a solid state disk (SSD) storage resource of the running device, and operation. At least one of a network resource of the device and a heterogeneous acceleration resource of the running device.
  • the case of the resources that can be used in the running device includes the resources that the running device can use, the resources that have been used, the usage rate of the resources, or the remaining rate of the resources.
  • the central control device acquires interference information of multiple running devices in the cluster, and may partition the running devices in the cluster according to interference information of multiple running devices.
  • the interference information may include interference intensity generated by a task running in the running device on a central processor, a memory, a network, and the like of the running device, and different interferences have different effects on the second task, and may select to interfere with the first task.
  • the smaller partition performs this first task.
  • the interference caused by the first task to the running device and the interference that the first task itself can bear are obtained by applying the resource interference model, where the application resource interference model is a software program for describing the application to the system resource (15 types) The resulting interference and the interference that the application itself can withstand. As shown in Table 1:
  • T1_C_T indicates the interference value of the resource CPU that TaskT_1 can bear
  • T1_C_C indicates the interference value of TaskT_1 to the resource Cpu. Only some resources are listed in Table 1.
  • the first task T_1 is separately run to obtain a certain performance indicator, for example, running the web application server separately, obtaining the calculation index of the central processor per second; separately running the central processor interference program in the basic interference program SOI, obtaining The calculation rate per second of the central processing unit; simultaneously running the first processor T_1 and the central processor interference program in the basic interference program SOI, and adjusting the interference strength, so that the calculation rate per second of T_1 is reduced to 95% of the original performance index.
  • a certain performance indicator for example, running the web application server separately, obtaining the calculation index of the central processor per second; separately running the central processor interference program in the basic interference program SOI, obtaining The calculation rate per second of the central processing unit; simultaneously running the first processor T_1 and the central processor interference program in the basic interference program SOI, and adjusting the interference strength, so that the calculation rate per second of T_1 is reduced to 95% of the original performance index.
  • the interference intensity of the basic interference program SOI is the interference of the central processor that T_1 can withstand, and at the same time, the change of the interference intensity of the SOI of the basic interference program in the process is detected, in the process
  • the change in the interference strength of the SOI of the underlying jamming program is the intensity of the interference generated by the web application to the central processor.
  • the central control device according to the resource information parameter value of each of the plurality of operating devices and/or the interference information of each of the plurality of operating devices and the plurality of partition value ranges
  • Each running device is divided into corresponding partitions.
  • the partition may be divided according to the usage rate of the central processing unit of the running device, when the central processor usage rate of the running device is less than 30% partitioning partition 1; when the running processor has a central processor usage rate greater than or equal to 30%, Less than or equal to 60% partitions partition 2; when the running device's CPU usage is greater than 60% partition partition 3.
  • the partition is divided according to the storage rate of the solid state hard disk, when the storage capacity of the solid state hard disk of the running device is less than 30% partitioning partition 1; when the storage capacity of the operating device is greater than or equal to 30%, less than or equal to 70% partitioning partition 2; When the running device's SSD storage rate is greater than 70%, partition 3 is partitioned.
  • the weighted average method may be used to perform weighted average on the central processor resource, the disk array resource, the solid state hard disk storage resource, the network resource, the heterogeneous acceleration resource, and the interference information to obtain a weighted average.
  • the weighted average method may be used to perform weighted average on the central processor resource, the disk array resource, the solid state hard disk storage resource, the network resource, the heterogeneous acceleration resource, and the interference information to obtain a weighted average. To divide the partition of the running device.
  • the partition is jointly divided according to the usage rate of the central processing unit and the storage rate of the solid state hard disk.
  • the weighted average method may be used to calculate the weighted average of the usage rate of the central processing unit and the storage rate of the solid state hard disk, which is more The weighted average divides the partition.
  • the embodiment of the present application is only described by taking the resource information including the foregoing five types of information and interference information as an example, but the embodiment of the present application is not limited thereto, and the resource information may further include other information.
  • the central control device may be based on a relationship between a resource information parameter value of each of the plurality of operating devices and a plurality of partition value ranges or according to interference information of each of the plurality of operating devices.
  • the relationship between the partitioning ranges is divided into the corresponding partitions; the central control device can also send the resource information parameters of each running device or the interference information of each running device to the partitioning function.
  • the device having the partitioning function according to the relationship between the resource information parameter value of each of the plurality of operating devices and the plurality of partition value ranges or the interference information according to each of the plurality of operating devices
  • the relationship between the plurality of partitions is divided into the corresponding partitions, and the central control device receives the partition result of the plurality of running devices sent by the device having the partition function.
  • the central control device may allocate a control device to each partition in the cluster, where the control device is used to manage the partition, the control device may be a running device in the partition, and the control device may not be the operation of the partition. device.
  • central control device may or may not belong to the cluster.
  • the central control device encapsulates the first task as the test task, where encapsulating the first task is to compress and package the first task, and is used to run the device test when the running device performs the first task.
  • the performance of the first task is not run directly by the running device.
  • control device receives the test task sent by the central control device.
  • control device 1 and the control device 2 respectively receive the test task transmitted by the central control device.
  • control device selects at least one of the operating devices in the partition to perform testing of the test task.
  • control device may perform performance testing of the encapsulated task by the plurality of running devices selected in the managed partition, and the control device may also control any one of the running devices selected by the device in the managed partition to perform the encapsulation. Performance testing of tasks.
  • control device transmits the test task to at least one of the operating devices controlled by the control device.
  • the at least one running device receives the test task.
  • the at least one running device tests the test task for obtaining the test result of the test task of the test.
  • control device 1 selects the operating device 11 to test the test task, and the control device 2 selects the operating device 21 to test the task. It should be understood that the control device 1 and the control device 2 may also select a plurality of operating devices within the zone under their control to test the test task.
  • the performance test result for the first task includes: a running state parameter value of the first task test, a maximum interference strength that the first task can withstand, and an interference strength of the first task to the running device. At least one of the following: the operating state parameter value at the time of the first task test, the maximum interference intensity that the first task can withstand, or at least one of the interference strength generated by the first task on the operating device Selecting a first partition for performing the first task from a plurality of partitions within the cluster.
  • the running status parameter of the first task test includes one or more of the following information, such as a delay, a query rate per second, a response time, and a throughput rate.
  • the at least one running device sends the measurement result to the control device, and the control device sends the test result to the central control device.
  • control device receives a test result of the test task sent by the at least one running device.
  • control device sends a test result of the test task to the central control device.
  • the central control device acquires test results of the test task sent by the control device of the plurality of partitions.
  • the central control device selects a first partition for performing the first task from a plurality of partitions in the cluster according to the test result.
  • the central control device selects the partition of the control device 1 as the first partition according to the test result sent by the control device 1 and the control device 2.
  • the central control device selects one or more operational status parameters that are most important for performing the first task as an indicator of selecting a first partition to perform the first task.
  • the central control device compares the operating state parameter values, for example, comparing the delay, the query rate per second, the response time, the throughput rate, the maximum interference strength that the task can withstand, and the interference strength of the first task to the operating device. And selecting a parameter that affects the execution of the first task as an indicator for selecting a first partition to execute the first task. For example, when the first task requires the fastest response time, the partition with the fastest response time may be selected as the parameter. Perform the first partition of the first task.
  • the partition that performs the first task is selected from the plurality of partitions according to the running state parameter value at the time of the first task test.
  • a partition for executing the first task is selected from the plurality of partitions according to a resource usage information required for the first task operation obtained by each partition.
  • the query rate per second of partition 1 is greater than the query rate per second of partition 2, and is also greater than the query rate per second of partition 3. Therefore, partition 2 with the highest query rate per second is selected as the first partition of the first task.
  • the central control device selects to perform the first partition of the first task according to the matching degree of the maximum interference strength that the first task can withstand and the interference strength of each partition.
  • the first partition performing the first task is selected from the plurality of partitions according to the maximum interference strength that the task can withstand and the interference strength of each partition.
  • the interference intensity of the partition 1 is the interval 1
  • the interference intensity of the partition 2 is the interval 2
  • the interference intensity of the partition 3 is the interval 3
  • the interval 1 is greater than the interval 2
  • the interval 2 is greater than the interval 3
  • the partition 3 is greater than the interval 3
  • the partition 2 or the partition 3 may be selected as the first partition of the first task, but the partition 3 has the least interference to the first task, and the partition 3 should be selected as the first task.
  • the interference strength of the partition 1 is the interval 1
  • the interference intensity that the first task can withstand in the partition 1 is smaller than the interference strength of the partition 1 is the interval 1, so the first task is not suitable for execution in the partition 1
  • the interference of the partition 2 The intensity is interval 2, the interference intensity that the first task can bear in the partition 2 is smaller than the interference intensity of the partition 2 is the interval 2, so the first task is not suitable to be executed in the partition 2;
  • the interference intensity of the partition 3 is the interval 3,
  • the interference intensity that the first task can withstand in partition 3 is greater than the interference strength of partition 3 is interval 3, so partition 3 is selected as the first partition of the first task.
  • the central control device selects a first partition that performs the first task according to the interference intensity generated by the first task to the running device and the interference intensity of each partition.
  • the interference strength generated by the first task on the running device refers to the interference generated by the first task on the running device when it is executed on the running device.
  • the interference strength of the partition 1 is the interval 1
  • the interference intensity generated by the first task in the partition 1 is 11
  • the interference intensity of the partition 2 is the interval 2
  • the interference intensity generated by the first task in the partition 2 is 22, and the partition 3
  • the interference intensity is the interval 3
  • the interference intensity generated by the first task in the partition 3 is 33, and the influence of the first task on the interference intensity of the three partitions is compared, and the partition with small interference intensity after the execution of the first task is selected as the partition.
  • the first partition of the first task is the interval 1
  • the interference intensity generated by the first task in the partition 1 is 11
  • the interference intensity of the partition 2 is the interval 2
  • the interference intensity generated by the first task in the partition 2 is 22, and the partition 3
  • the interference intensity is the interval 3
  • the interference intensity generated by the first task in the partition 3 is 33
  • the influence of the first task on the interference intensity of the three partitions is compared, and the partition with small interference intensity after the execution of the first task is selected as the partition.
  • the weighted average method may also be used, and the running state parameter values, such as the delay, the query rate per second, the response time, the throughput rate, the maximum interference strength that the task can withstand, and the task.
  • the weighted average of the interference strength generated by the running device is obtained by weighted average to select the first partition of the running device.
  • the central control device may also select a partition according to the task type. Specifically, when the first task is a long task, the central control device may select a partition with a higher CPU usage rate; when the first task is a batch task, the central control device may select a lower CPU usage rate. Partition.
  • the central control device transmits the first task to the control device of the first partition.
  • the central control device transmits the first task to the control device 1.
  • control device receives the first task sent by the central control device.
  • control device 1 receives the first task transmitted by the central control device.
  • control device selects a target operating device that performs the first task in the first partition.
  • control device selects the operating device 11 as the target operating device of the first task.
  • control device selects the target running device from the at least one running device that performs performance testing on the first task according to the test result of the at least one running device that performs performance testing on the first task.
  • control device selects, according to the type of the first task, a related parameter that affects execution of the first task as a condition for selecting a running device that performs the first task, for example, when the first task requires the fastest response time.
  • the running device with the fastest response time can be selected as the target running device for executing the first task.
  • the weighted average method may be used to perform weighted average of test results, such as delay, query rate per second, response time, and throughput, to obtain a weighted average to select the first task.
  • the target runs the device.
  • the control device performs a weighted average of the query and response time per second by using a weighted average method from at least one running device that tests the first task to obtain a weighted average number. To select the target running device of the first task.
  • control device may select any one of the target running devices to perform the first task in the partition.
  • the target running device may be a running device that tests the test task of the first task package in the partition, or may be a running device that does not test the test task encapsulated in the first task in the partition. .
  • control device sends the first task to the target running device.
  • control device 1 transmits the first task to the operating device 11.
  • the target running device receives the first task sent by the control device.
  • the target running device performs the first task.
  • the central control device selects a partition for executing the first task from a plurality of partitions in the cluster by using a test result of the test task, and sends the first task to a control device that controls the partition. Therefore, the job scheduling is performed in units of partitions, and the running equipment is reasonably scheduled to utilize system resources reasonably.
  • all resource management and task scheduling in the centralized resource management scheduling scheme are on one control node.
  • the central control node becomes the bottleneck of the entire system.
  • the central control device sends the first task to the control device that controls the partition by selecting a partition for executing the first task from multiple partitions in the cluster, thereby using a partition as a unit.
  • the job scheduling is performed, and the running equipment is reasonably scheduled, which solves the problem that the central control node becomes the bottleneck of the whole system when the cluster size is expanded in the centralized resource management scheduling.
  • the system has two schedulers in the hierarchical shared scheduling scheme.
  • the two schedulers share all resources of the system, and the two schedulers concurrently schedule, which is easy to generate scheduling resource conflicts. The more times, the faster the system performance degrades.
  • the first task is sent by the central control device to the partition control device, where the partition control device selects the running device of the first task, and there is no resource conflict problem caused by multiple schedulers, thereby improving system performance. .
  • the running device when the running device performs the first task, since the executed task may occupy resources such as a central processing unit or a memory of the running device, the task may generate a central processor, a memory, a network, etc. of the running device.
  • the interference causes the parameter value of the resource information of the running device and the strength of the interference information to be out of the range of the first partition, so the central control device can re-determine according to the resource information and/or the interference information after the running device performs the task.
  • the second partition of the running device since the executed task may occupy resources such as a central processing unit or a memory of the running device, the task may generate a central processor, a memory, a network, etc. of the running device.
  • the interference causes the parameter value of the resource information of the running device and the strength of the interference information to be out of the range of the first partition, so the central control device can re-determine according to the resource information and/or the interference information after the running device performs the task.
  • the central control device re-divides the second partition of the running device. If the second partition of the running device is different from the first partition of the running device, the running device should be reassigned to the second partition.
  • FIG. 4 is a schematic flowchart of a method 300 for running device scheduling according to an embodiment of the present application. As shown in Figure 4, the method 300 includes the following.
  • the updated resource information of the running device is sent to the central control device.
  • the updated resource is sent to the control device of the first partition of the running device, and the control device sends the updated resource to the central control device.
  • the running device may directly send the updated resource information of the running device to the central control device, where the resource information includes a central processing unit resource, a disk array resource, and a solid state hard disk (Solid State Drives) after the running device performs the task.
  • the resource information includes a central processing unit resource, a disk array resource, and a solid state hard disk (Solid State Drives) after the running device performs the task.
  • SSD solid state hard disk
  • the interference information mainly includes the interference intensity generated by the tasks running in the running device.
  • the central control device receives the updated resource information and interference information of the running device.
  • the central control device determines the second partition of the target running device according to the relationship between the updated resource information and the plurality of partition value ranges.
  • the central control device determines that the second partition of the target running device is the same partition as the first partition after the target running device performs the first task, the partition of the running device is maintained.
  • the central control device determines that the second partition of the target running device is different from the first partition after the target running device performs the first task, the central control device belongs to the partition to which the target running device belongs.
  • the first partition is updated to the second partition.
  • FIG. 5 is a schematic block diagram of dynamic partitioning of a running device of a method for scheduling a running device according to an embodiment of the present application.
  • the running device 3 of the first partition completes the execution of the first task, and the central control device re-partitions the running device 3 according to the resource information or the interference information updated by the running device 3.
  • the central device determines that the partition of the running device 3 is the second partition, the operating device 3 is divided into the second partition, the control device 1 of the first partition no longer manages the running device 3, and the control device 2 of the second partition pairs the operating device 3 Manage.
  • the central control device sends first indication information to the control device of the first partition, where the first indication information is used to indicate that the control device of the first partition deletes the information of the running device.
  • control device of the first partition receives the first indication information.
  • the central control device sends the second indication information to the running device, where the second indication information is used to indicate that the target running device changes the belonging device from the first partition to the second partition.
  • the third control information is sent to the central control device, where the third indication information is used to indicate that the control device has deleted the resource information of the running device; the central control device After receiving the third indication information, the second indication information is sent to the running device, where the second indication information is used to indicate that the target running device changes the partition to which the running device belongs from the first partition to the second partition.
  • the operating device receives the second indication information.
  • the running device sends the resource information of the running device update to the control device of the second partition.
  • control device of the second partition receives the resource information of the running device update.
  • the embodiment of the present invention further includes the corresponding processes of the foregoing methods in FIG. 2 and FIG. 3, and details are not described herein for brevity.
  • the running device after the running the device performs the first task, the running device sends the updated resource information or interference information of the running device to the central control device, and the central control device is configured according to the updated resource.
  • the information and/or interference information is re-divided into the partition in which the running device is located. Therefore, the central control device can detect resource usage of the entire cluster in real time and utilize system resources reasonably.
  • the central controller of the hierarchical scheduling scheme is only responsible for managing cluster resources, and allocating cluster resources to the computing framework.
  • Each computing framework performs task scheduling according to the allocated resources, and each computing framework cannot detect the entire computing framework.
  • Real-time resource usage of the cluster The method of the embodiment of the present application, after the target running device performs the task, sends the updated resource information of the running device to the central control device, so that the central device can detect the resource usage of the entire cluster in real time.
  • the central control device re-divides the partition where the running device is located according to the updated resource information or interference information, performs reasonable scheduling on the running device, and utilizes system resources reasonably to implement the partition management system resource.
  • FIG. 6 is a schematic block diagram of a central control device 400 in accordance with an embodiment of the present application. As shown in FIG. 6, the central control device 400 includes:
  • the sending module 410 is configured to send a test task to the control device of the multiple partitions in the cluster, where the test task is a test task of the first task, and each of the multiple partitions includes at least one running device;
  • the obtaining module 420 is configured to receive a test result of the test task sent by the control device of the multiple partitions;
  • the selecting module 430 is configured to select, according to the test result, a first partition for performing a first task from a plurality of partitions in the cluster, where the first task is an executed task;
  • the sending module 410 is further configured to send the first task to the control device of the first partition, so that the control device selects a target running device that performs the first task from the first partition.
  • the selecting module 430 is specifically configured to: package the first task as the test task; send the test task to the control device of each partition; the test result specifically includes the to-be-executed task for the package Test Results.
  • the test result includes: at least one of a running state parameter value of the first task test, a maximum interference strength that the first task can withstand, and an interference strength of the first task to the running device;
  • the selection module 430 is specifically configured to:
  • At least one of the operating state parameter value at the time of the first task test, the maximum interference strength that the first task can withstand, and the interference intensity generated by the first task to the operating device, from the plurality of partitions in the cluster A first partition for performing the first task is selected.
  • the central control device further includes a dividing module 440, configured to use the resource information and/or the interference information of the multiple running devices acquired by the acquiring module 410 to The running device is divided into the plurality of partitions;
  • the partitioning module 440 is further configured to allocate a control device for each of the partitions.
  • the obtaining module 420 is further configured to:
  • resource information and/or interference information of a plurality of running devices in the cluster where the resource information is used to indicate a resource that can be used in the running device, where the interference information includes a task on the running device
  • the intensity of the interference generated by the device is used to indicate a resource that can be used in the running device.
  • the obtaining module 420 is further configured to: when the target running device completes the first task, acquire updated resource information of the target running device;
  • the dividing module 440 is further configured to update the partition to which the target running device belongs from the first partition to the second partition according to the updated resource information sent by the target running device and the resource information of the multiple running devices in the cluster.
  • the dividing module 440 is specifically configured to: divide each running device according to a relationship between a resource information parameter value and/or interference information of each running device of the multiple running devices and multiple value ranges The corresponding partition.
  • the obtaining module 420 is specifically configured to: acquire the first task input by the user through an interface of the first computing framework of the multiple computing frameworks; the sending module 430 is specifically configured to: by calling the first computing framework The control device controlling the first partition transmits the first task.
  • FIG. 8 is a schematic block diagram of a control device 500 in accordance with an embodiment of the present application. As shown in FIG. 8, the control device 500 includes:
  • the receiving module 510 is configured to receive a test task sent by the central control device, where the control device is configured to manage at least one running device in the first one of the plurality of partitions in the cluster, where the test task is a test task;
  • a selection module 520 configured to select at least one running device in the partition to perform performance testing of the test task
  • the receiving module 510 is further configured to: receive a performance test parameter value of the test task sent by the running device that performs the performance test of the test task;
  • the sending module 530 is configured to send, to the central control device, a performance test parameter value of the test task, where the central control device determines a first partition of the first task;
  • the receiving module 510 is further configured to: receive a first task sent by the central control device, where the first task is an executed task;
  • the selection module 520 is further configured to: select a target running device that performs the first task in the first partition.
  • control device further includes a scheduling module, where the receiving module 510 is specifically configured to: receive a first task sent by the central control device; the selecting module 520 is specifically configured to: select the first partition to execute the first The target of the task runs the device; the scheduling module is specifically configured to: schedule the target running device to perform the first task.
  • the selecting module 520 is specifically configured to select, according to a performance test parameter value of the at least one running device that performs performance testing on the first task, from at least one running device that performs performance testing on the first task.
  • the target runs the device.
  • the receiving module 510 is specifically configured to: receive first indication information that is sent by the central control device, where the first indication information is used to instruct the control device to delete information of the target operating device; the sending module is further configured to The central control device sends a second indication information, where the second indication information is used to indicate that the control device has deleted the message of the target running device
  • FIG. 9 is a schematic block diagram of a running device 600 in accordance with an embodiment of the present application. As shown in FIG. 9, the running device 600 includes:
  • the receiving module 610 is configured to receive a test task sent by the control device
  • the test module 620 is configured to test the test task, so as to obtain the test result of the test task, the control device is configured to manage a first partition where the running device is located, and the first partition includes at least one running device;
  • the sending module 630 is configured to send the test result to the control device, where the control device sends the test result to the central control device, so that the central control device selects a partition that performs the first task from the plurality of partitions.
  • the running device further includes an execution module, wherein the receiving module is further configured to receive the first task sent by the control device, and the execution module is configured to execute the first task.
  • the sending module 630 is further configured to send resource information and/or interference of the running device to the central control device or the control device to which the running device belongs before the running device performs the first task sent by the control device.
  • the receiving module 610 is further configured to receive the first indication information sent by the central control device or the control device to which the running device belongs, where the first indication information is used to indicate that the running device belongs to the partition to which the running device belongs The first partition is updated to the second partition; the sending module is further configured to send the resource information of the running device to the control device of the second partition.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • FIG. 10 is a schematic structural diagram of a communication device 700 according to an embodiment of the present application.
  • the communication device 700 includes a processor 710, a memory 720, and a transceiver 730.
  • the memory 720 is for storing instructions
  • the processor 710 is configured to execute instructions stored by the memory 720.
  • the processor 710 can control the transceiver 730 to communicate externally.
  • the processor 710, the memory 720, and the transceiver 730 communicate with one another via internal interconnect paths to communicate control and/or data signals.
  • the communication device can be a central control device.
  • the processor 710 in the communication device 700 can invoke the instructions in the memory 720 to implement the corresponding processes performed by the central control devices of the various methods of FIGS. 2 through 5, for brevity, I will not repeat them here.
  • the communication device can also be a control device.
  • the processor 710 in the communication device 700 can call the instructions in the memory 720 to implement the corresponding flow performed by the control device of each method in FIGS. 2 to 5, for the sake of brevity, here. No longer.
  • the communication device may also be a running device.
  • the processor 710 in the communication device 700 can invoke the instructions in the memory 720 to implement the corresponding processes performed by the running devices of the various methods in FIGS. 2 to 5, for the sake of brevity, here. No longer.
  • the processor may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
  • the processor may further include a hardware chip.
  • the hardware chip may be an Application-Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof.
  • the PLD may be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), a Generic Array Logic (GAL), or any combination thereof.
  • the memory can be either volatile memory or non-volatile memory, or can include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • the embodiment of the present application provides a computer readable medium for storing a computer program, the computer program comprising a communication method for performing the embodiments of the present application in FIG. 2 to FIG. 10 above.
  • the readable medium may be a ROM or a RAM, which is not limited in this embodiment of the present application.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施例提供了一种调度运行设备的方法和中央控制设备、控制设备和运行设备,实现了以分区为单位合理调度运行设备,并根据任务的测试结果选择了执行该任务的合理分区,对集群内的系统资源进行了合理利用,对运行设备进行了合理调度。该方法包括:该中央控制设备向集群内的多个分区的控制设备发送测试任务,该多个分区中每个分区包括至少一个运行设备;获取该多个分区的控制设备发送的该测试任务的测试结果;根据该测试结果,从集群内的多个分区中选择用于执行该第一任务的第一分区;向该第一分区的控制设备发送任务,以便于该控制设备从该第一分区中选择执行该任务的运行设备。

Description

调度运行设备的方法、设备和运行设备
本申请要求于2017年02月28日提交中国专利局、申请号为201710112027.3、申请名称为“调度运行设备的方法、设备和运行设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,并且更具体地,涉及一种调度运行设备的方法、设备和运行设备。
背景技术
随着信息网络在世界范围内的高速普及,基于互联网产生的数据迅速增长。如何处理海量的数据与服务,以有效地为用户提供方便、快捷的服务,已成为信息技术(Information Technology,IT)发展面临的一个重要问题。
云计算能够以便利、按需的方式访问计算资源共享池,其中,计算机资源共享池中包括的资源可称为系统资源,该资源包括网络,服务器,存储,应用软件和服务等。现有的资源调度方案集中管理系统资源和作业调度,并且不能实时探知整个系统的资源使用情况,无法得到更优化的调度方案。
因此,如何进行系统资源的管理和作业调度,合理利用系统资源是一项亟待解决的问题。
发明内容
本申请实施例提供了一种调度运行设备的方法、设备和运行设备,实现了分区管理系统资源,可以实时探知整个系统的资源使用情况,对运行设备进行合理的调度,合理利用系统资源。
第一方面,提供了一种调度运行设备的方法,包括:向集群内的多个分区的控制设备发送测试任务,该多个分区中每个分区包括至少一个运行设备;获取该多个分区的控制设备发送的该测试任务的测试结果;根据该测试结果,从集群内的多个分区中选择用于执行第一任务的第一分区,该第一任务为执行的任务;向该第一分区的控制设备发送该第一任务,以便于该控制设备从该第一分区中选择执行该第一任务的目标运行设备。
可选地,该中央控制设备和不同分区的控制设备可以为同一个设备。
因此,在本申请实施例中,该中央控制设备向集群内的多个分区的控制设备发送测试任务,实现了以分区为单位合理调度运行设备,中央控制设备通过该测试任务的测试结果从集群内的多个分区中选择用于执行该第一任务的分区,选择了执行该第一任务的合适的分区,对集群内的系统资源进行了合理利用。
在第一方面的一种可选实现方式中,该向集群内的多个分区的控制设备发送测试任 务,包括:将该第一任务封装为该测试任务;向该每个分区的控制设备发送该测试任务。
可选地,该中央控制设备和/或该每个分区的控制设备可以属于该集群。
可选地,该中央控制设备和/或该每个分区的控制设备可以不属于该集群。
可选地,中央控制设备可以在多个分区内随机选择向至少一个分区的控制设备发送封装的该第一任务。
在第一方面的一种可选实现方式中,该测试结果包括:该第一任务测试时的运行状态参数值、该第一任务能够承受的最大干扰强度和该第一任务对该运行设备产生的干扰强度中的至少一种。
可选地,该第一任务测试时的运行状态参数包括以下一个或多个信息,如时延、每秒查询率、响应时间和吞吐率等指标。
可选地,该中央控制设备选择影响该第一任务的一个或多个运行状态参数作为选择执行该第一任务的第一分区的指标。
可选地,该中央控制设备根据该第一任务能够承受的最大干扰强度与每个分区的干扰强度的匹配度,选择执行该第一任务的第一分区。
可选地,该中央控制设备根据该第一任务对运行设备产生的干扰强度和每个分区的干扰强度,选择执行该第一任务的第一分区。
可选地,该中央控制设备还可以利用加权平均法,对性能测试的多个参数值,如时延、每秒查询率、响应时间、吞吐率和该第一任务能承受的最大干扰强度进行加权平均得到加权平均数来选择该运行设备的第一分区。
此时,该中央控制设备根据多个分区中每个分区在对第一任务进行性能测试得到的参数值,从多个分区中选择用于执行该第一任务的分区,可以更精确的选择执行第一任务的分区,从而对系统资源进行了合理利用。
在第一方面的一种可选实现方式中,在该获取第一任务之前,该方法还包括:在所述向集群内的多个分区的控制设备发送测试任务之前,所述方法还包括:
获取所述集群内的多个运行设备的资源信息和/或干扰信息,其中,所述资源信息用于指示所述运行设备中能够使用的资源的情况,所述干扰信息包括所述运行设备上的任务对所述运行设备产生的干扰强度;根据所述多个运行设备的资源信息和/或干扰信息,将所述多个运行设备划分为所述多个分区;为所述每个分区分配控制设备。
可选地,该运行设备中能够使用的资源的情况包括资源的使用率、资源的剩余率、已被使用的资源或可使用的资源。
可选地,该运行设备上的第一任务对运行设备产生的干扰强度包括该运行设备上已经执行或正在执行的任务对运行设备产生的干扰强度。
可选地,该中央控制设备和/或每个分区的控制设备可以属于该集群。
可选地,该中央控制设备和/或每个分区的控制设备可以不属于该集群。
此时,该中央控制设备为多个运行设备划分了分区,并为每个分区分配了控制设备,实现了以分区为单位进行资源管理,中央设备不再集中管理系统资源和任务调度,解决了系统资源扩大时,中央控制设备成为系统瓶颈的问题。
在第一方面的一种可选实现方式中,当该目标运行设备完成该第一任务时,该方法还包括:该中央控制设备获取该目标运行设备的更新后的资源信息和/或干扰信息;根据该 目标运行设备发送的更新后的资源信息和/或干扰信息与多个分区范围取值的关系,将该目标运行设备所属分区从该第一分区更新为第二分区。
可选地,该中央控制设备接收该运行设备发送的更新后的资源信息。
可选地,该运行设备向该第一分区的控制设备发送更新后的资源信息,该第一分区的控制设备向该中央控制设备发送更新后的资源信息。
在第一方面的一种可选实现方式中,将该目标运行设备所属分区从该第一分区更新为第二分区,包括:该中央控制设备向该第一分区的控制设备发送第一指示信息,该第一指示信息用于指示该第一分区的控制设备删除该执行该目标运行设备的信息;该中央控制设备向该目标运行设备发送第二指示信息,该第二指示信息用于指示该目标运行设备将该运行设备将所属分区从该第一分区更改为该第二分区。
可选地,该中央控制设备向该目标运行设备发送第一指示信息,该第一指示信息用于指示该目标运行设备将该运行设备将所属分区从该第一分区更改为该第二分区;该中央控制设备向该第一分区的控制设备发送第二指示信息,该第二指示信息用于指示该第一分区的控制设备删除该执行该目标运行设备的信息。
可选地,该中央该中央控制设备向该第一分区的控制设备发送第一指示信息,该第一指示信息用于指示该第一分区的控制设备删除该执行该目标运行设备的信息;在该中央控制设备收到该第一分区的控制设备发送的第二指示信息后,该中央控制设备向该目标运行设备发送第三指示信息,该第二指示用于指示该第一分区的控制设备已经删除该运行设备的信息,该第三指示信息用于指示该目标运行设备将该运行设备将所属分区从该第一分区更改为该第二分区。
此时,该中央控制设备通过获取该运行设备更新后的资源信息,可以实时探知整个系统资源的使用情况,该中央控制设备根据更新后的资源信息和/或干扰信息,重新划分该运行设备的分区,从而实现了分区的动态划分,合理调度运行设备,合理使用系统资源。
在第一方面的一种可选实现方式中,该运行设备的资源包括下列信息中的至少一项资源:该运行设备的存储信息;该运行设备的中央处理器信息;该运行设备的网络信息;该运行设备的异构加速信息和该运行设备的干扰信息。
在第一方面的一种可选实现方式中,根据该多个运行设备的资源信息和/或干扰信息,将该多个运行设备划分为该多个分区划分,包括:根据该多个运行设备中每个运行设备的资源信息参数值与多个分区取值范围的关系,将每个运行设备划分到所对应的分区。
可选地,根据每个运行设备的一种信息参数值与多个分区取值范围的关系,将每个运行设备划分到所对应的分区。
可选地,根据每个运行设备的多种信息参数值,进行加权平均计算得到加权平均数来划分该运行设备的分区
此时,该中央控制设备根据多个运行设备的资源信息和/或干扰信息与多个分区取值范围的关系对集群内的多个运行设备进行了分区,形成了多个分区,从而,以分区为单位进行系统资源管理和作业调度,对运行设备进行合理的调度,合理利用系统资源。
在第一方面的一种可选实现方式中,该向集群内的多个分区的控制设备发送测试任务之前,该方法还包括:
获取用户通过多个计算框架中的第一计算框架的接口输入的该第一任务;该向控制该 第一分区的控制设备发送该第一任务,包括:通过调用该第一计算框架向控制该第一分区的控制设备发送该第一任务。
第二方面,提供了一种调度运行设备的方法,包括:该控制设备接收中央控制设备发送的测试任务,该控制设备用于管理集群中的多个分区中的第一分区内的至少一个运行设备;选择该分区中的至少一个运行设备进行该测试任务的测试;接收该进行该测试任务的测试的运行设备发送的该测试任务的测试参数值;向该中央控制设备发送该测试任务的测试参数值,用于该中央控制设备确定该第一任务的第一分区。
因此,在本申请实施例中,该控制设备接收中央控制设备发送的测试任务并选择该分区中的至少一个运行设备进行该测试任务的性能测试,接收该进行该测试任务的性能测试的运行设备发送的该测试任务的性能测试参数值;向该中央控制设备发送该测试任务的性能测试参数值,使得中央控制设备选择了执行该第一任务的合理的分区,对集群内的系统资源进行了合理利用,对运行设备进行了合理调度。
在第二方面的一种可选实现方式中,该控制设备接收中央控制设备发送的第一任务;选择所述第一分区内执行所述第一任务的目标运行设备;调度所述目标运行设备执行所述第一任务。
在第二方面的一种可选实现方式中,选择该第一分区内执行该第一任务的目标运行设备,包括:根据对该第一任务进行测试的至少一个运行设备的测试参数值,从该第一任务进行测试的至少一个运行设备中,选择该目标运行设备。
可选地,该控制设备可以在该第一分区任意选择一个运行设备执行该第一任务。
在第二方面的一种可选实现方式中,该方法还包括:接收该中央控制设备发送的第一指示信息,该第一指示信息用于指示该控制设备删除该执行该第一任务的运行设备的信息;向该中央控制设备发送第二指示信息,该第二指示信息用于指示该控制设备已删除该目标运行设备的消息。
第三方面,提供了一种调度运行设备的方法,包括:运行设备接收控制设备发送的测试任务,所述控制设备用于管理所述运行设备所在的第一分区,所述第一分区包括至少一个运行设备;
测试所述测试任务,用于获取测试的所述测试任务的测试结果;
向所述控制设备发送所述测试结果,用于所述控制设备向所述中央控制设备发送所述测试结果,以便于所述中央控制设备从多个分区选择执行第一任务的分区。
因此,在本申请实施例中,该运行设备向该控制设备发送该测试结果,用于该控制设备向该中央控制设备发送该测试结果,以便于该中央控制设备从多个分区选择执行该第一任务的分区,该运行设备执行该控制设备发送的第一任务。从而,中央控制设备对运行设备进行合理的调度,为该第一任务合理的选择了运行设备,合理利用计算机资源。
在第三方面的一种可选实现方式中,该运行设备接收所述控制设备发送的所述第一任务;执行所述第一任务。
在第三方面的一种可选实现方式中,在该运行设备执行控制设备发送的第一任务之前,该方法还包括:向该中央控制设备或该运行设备所属的控制设备发送该运行设备的资源信息,用于中央控制设备对该运行设备进行分区的分配。
在第三方面的一种可选实现方式中,该方法还包括:接收该中央控制设备或该运行设 备所属的控制设备发送的第一指示信息,该第一指示信息用于指示该运行设备将该运行设备所属的分区从第一分区更新为第二分区;向该第二分区的控制设备发送该运行设备的资源信息。
第四方面,提供了一种调度运行设备的系统,该系统包括中央控制设备、多个控制设备和多个运行设备,其中,该多个运行设备被划分为多个分区,该多个分区中的每个分区包括至少一个运行设备,该多个控制设备中的每个控制设备分别控制该多个分区中的一个分区;
该中央控制设备用于:向该多个控制设备发送测试任务;以及,接收该多个控制设备发送的该测试结果,根据该测试结果,从该多个分区中选择用于执行任务的第一分区,并向该第一分区的控制设备发送该任务。
该控制设备用于:接收该中央控制设备发送的该测试任务,并向所控制的分区中的至少部分运行设备发送该测试任务;接收所控制的该至少部分运行设备发送的该测试结果,向该中央控制设备发送该测试结果;以及,接收该中央控制设备发送的该任务,选择所控制的分区内执行该任务的目标运行设备,以及调度所控制的分区内的目标运行设备执行该任务;
该运行设备用于:接收控制设备发送的该测试任务,测试该测试任务,用于获取该测试任务的测试结果,并向各自的控制设备发送该测结果;以及,根据控制设备的调度,执行该任务。
因此,在本申请实施例中,中央控制设备通过该测试任务的测试结果从集群内的多个分区中选择用于执行该第一任务的分区,选择了执行该第一任务的合理的分区,对集群内的系统资源进行了合理利用,对运行设备进行了合理调度。
在第四方面的一种可选实现方式中,该中央控制设备还用于:封装该任务,用于获取该测试任务。
在第四方面的一种可选实现方式中,该测试结果包括:该任务测试时的运行状态参数值、该任务能够承受的最大干扰强度和该任务对该运行设备产生的干扰强度中的至少一种。
在第四方面的一种可选实现方式中,该中央控制设备还用:获取该多个运行设备的资源信息和/或干扰信息,其中,该资源信息用于指示该运行设备中能够使用的资源的情况,该干扰信息包括该运行设备上的任务对该运行设备产生的干扰强度;根据该多个运行设备的资源信息和/或干扰信息,将该多个运行设备划分为该多个分区;从该多个控制设备中,为该每个分区分配控制设备。
在第四方面的一种可选实现方式中,该运行设备还用于:向中央控制设备或该运行设备所属的控制设备发送该运行设备更新后的资源信息和/或干扰信息;
该控制设备还用于:接收该运行设备发送的该运行设备更新后的资源信息和/或干扰信息,以及向该中央控制设备发送该运行设备更新后的资源信息和/或干扰信息;
该中央控制设备还用于:接收该运行设备或该运行设备所属的控制设备发送的该目标运行设备更新后的资源信息和/或干扰信息;根据该更新后的资源信息和/或干扰信息与多个分区取值范围的关系,更新该运行设备的所属分区。
第五方面,提供了一种中央控制设备,用于执行上述第一方面或第一方面的任一种可 能的实现方式中的方法。具体地,该中央控制设备包括用于执行上述第一方面或第一方面的任一种可能的实现方式中的方法的模块单元。
第六方面,提供了一种控制设备,用于执行上述第二方面或第二方面的任一种可能的实现方式中的方法。具体地,该控制设备包括用于执行上述第二方面或第二方面的任一种可能的实现方式中的方法的模块单元。
第七方面,提供了一种运行设备,用于执行上述第三方面或第三方面的任一种可能的实现方式中的方法。具体地,该运行设备包括用于执行上述第三方面或第三方面的任一种可能的实现方式中的方法的模块单元。
第八方面,提供了一种中央控制设备,用于执行上述第一方面或第一方面的任一种可能的实现方式中的方法,该中央控制设备包括处理器、存储器和收发器,该处理器用于调用存储器中存储的指令,执行上述第一方面或其任一种可选的实现方式中的方法。
第九方面,提供了一种控制设备,用于执行上述第二方面或第二方面的任一种可能的实现方式中的方法,该中央控制设备包括处理器、存储器和收发器,该处理器用于调用存储器中存储的指令,执行第二方面或其任一种可选的实现方式中的方法。
第十方面,提供了一种运行设备,用于执行上述第三方面或第三方面的任一种可能的实现方式中的方法,该中央控制设备包括处理器、存储器和收发器,该处理器用于调用存储器中存储的指令,执行第三方面或其任一种可选的实现方式中的方法。
第十一方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行上述第一方面或第一方面的任一种可能的实现方式,第二方面或第二方面的任一种可能的实现方式以及第三方面或第三方面的任一种可能的实现方式中的方法的指令。
附图说明
图1是根据本申请实施例的集群通信系统的示意图。
图2是根据本申请实施例的一种运行设备调度的方法的示意性流程图。
图3是根据本申请实施例的划分运行设备分区的示意图。
图4是根据本申请实施例的一种运行设备调度的方法的示意性流程图。
图5是根据本申请实施例的分区动态划分的示意性图。
图6是根据本申请实施例的中央控制设备的示意性框图。
图7是根据本申请实施例的中央控制设备的示意性框图。
图8是根据本申请实施例的控制设备的示意性框图。
图9是根据本申请实施例的运行设备的示意性框图。
图10是根据本申请实施例的一种通信设备的结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1是本申请实施例的集群通信系统的示意图。如图1所示,该系统100包括中央控制设备,控制设备和运行设备。该中央控制设备包括中央控制设备101,该控制设备包括控制设备110和控制设备111,该运行设备包括运行设备120、运行设备121、运行设备122和运行设备123。
该所有运行设备构成集群,所有运行设备的资源成为资源共享池,该资源共享池的资源包括每个运行设备的中央处理器资源、每个运行设备的磁盘阵列资源、每个运行设备的固态硬盘(Solid State Drives,SSD)存储资源,每个运行设备的网络资源,每个运行设备的异构加速资源。
其中,网络资源包括网络拓扑和网络带宽,异构加速资源包括GPU、GPGPU、GPDSP、ASIC、FPGA和其它类型的众核处理器等资源。
应理解,该中央控制设备和/或该控制设备也可以在集群内,即可以选择集群内的运行设备作为整个集群的中央控制设备或分区的控制设备。
运行设备120和运行设备121属于同一分区,控制设备110用于对该分区进行资源管理和任务调度。运行设备122和运行设备123属于同一分区,控制设备111用于对该分区进行资源管理和任务调度。中央控制设备101用于为任务选择分区并将该任务发送给选择的分区的控制设备,该分区的控制设备调度分区内的运行设备执行该任务。该分区是指该包括集群内的一些运行设备的一个逻辑区域,处于同一分区的多个运行设备的资源的使用情况处于同一范围和/或处于同一分区的多个运行设备上的任务对所述运行设备产生的干扰强度处于同一范围。
中央控制设备上承载各类计算框架,例如Hadoop、Spark、MPI和Storm等计算框架。Hadoop是一个在计算机上分布式处理数据的计算框架,适用于离线大批量数据处理;Spark是一种基于内存计算的并行计算框架,它将数据尽可能放到内存中以提高迭代应用和交互式应用的计算效率,不能用于处理需要长期保存的数据;MPI是一种基于消息传递的并行计算框架,适用于各种复杂应用的并行计算,支持多程序多数据;Storm是一种在线实时处理计算框架,不进行数据的收集和存储工作,直接通过网络实时接收数据并且实时处理数据。用户通过计算框架上的接口提交任务,中央控制设备根据不同的任务类型启动相应的计算框架。各类计算框架用于管理任务和将该任务发送给分区的控制设备。
可选地,该运行设备可以为物理服务器、也可以是虚拟机和容器。
图1所示的系统仅仅为了更加清楚地理解本申请,不应对本申请实施例构成特别的限定。例如,除了控制设备110和控制设备111,中央控制设备还可以管理其他控制设备,且每个控制设备可以不仅调度两个运行设备,还可以仅调度一个或调度三个以上的运行设备。
为了更好地理解本申请,以下将结合图2-图10,以与图1所示的系统相同或相似的系统为例对本申请实施例进行说明。
图2是根据本申请实施例的一种运行设备调度的方法200的示意性流程图。如图2示出了两个控制设备,分别为控制设备1和控制设备2,该控制设备1控制运行设备11,该控制设备2控制运行设备21,这只是为了方便描述,不应对本申请实施例构成特别的限定。例如,除了控制设备1和控制设备1,中央控制设备还可以管理其他控制设备,且每个控制设备可以不仅控制一个运行设备,还可以仅控制多个运行设备。
如图2所示,该方法200包括以下内容。
在201中,向集群内的多个分区的控制设备发送该测试任务,该测试任务为第一任务的测试任务,该多个分区中每个分区包括至少一个运行设备。
例如,如图2所示,中央控制设备向控制设备1和控制设备2发送该测试任务。
可选地,在向集群内的多个分区的控制设备发送该测试任务之前,该方法还包括获取第一任务。
可选地,用户在中央控制设备上提交该第一任务。
可选地,用户在客户端上提交该第一任务,由客户端向中央控制设备发送该第一任务。
可选地,该第一任务包括长任务和批量运算任务等。长任务是指长时间在平台中运行的任务,如WEB服务程序;批任务是指一次进行大量计算但时间短的任务,如Hadoop的大数据处理。
应理解,在中央控制设备获取第一任务之前,中央控制设备可以已经对集群内的多个运行设备划分了分区。具体而言,如图3所示。
图3是根据本申请实施例的一种运行设备调度的方法的划分运行设备的分区示意图。中央控制设备获取该集群内的多个运行设备的资源信息和/或干扰信息,对集群内的运行设备划分分区,分配每个分区的控制设备。如图3所示,运行设备0、运行设备1和运行设备2属于分区1,控制设备1管理分区1;运行设备3、运行设备4和运行设备5为分区2,控制设备2管理分区2;运行设备7、运行设备8和运行设备9为分区3,控制设备3管理分区3。
其中,集群内的多个运行设备可以通过代理插件收集该运行设备的资源信息和/或干扰信息,并且向该中央控制设备发送该运行设备的资源信息和/或干扰信息。
可选地,该资源信息用于指示该运行设备中能够使用的资源的情况,该干扰信息包括该运行设备上的任务对该运行设备产生的干扰强度。
可选地,本申请实施例提到的运行设备的能够使用的资源包括运行设备的中央处理器资源、运行设备的磁盘阵列资源、运行设备的固态硬盘(Solid State Drives,SSD)存储资源、运行设备的网络资源和运行设备的异构加速资源中的至少一种。该运行设备中能够使用的资源的情况包括运行设备可使用的资源、已使用的资源、资源的使用率或资源的剩余率。
可选地,中央控制设备获取集群内的多个运行设备的干扰信息,可以根据多个运行设备的干扰信息,对集群内的运行设备进行分区的划分。该干扰信息可以包括运行设备中运行的任务对运行设备的中央处理器、内存、网络等产生的干扰强度,不同的干扰对该第二的任务的影响不一样,可以选择对该第一任务干扰较小的分区执行该第一任务。
可选地,通过应用资源干扰模型获取第一任务对运行设备产生的干扰和第一任务自身能承受的干扰,其中应用资源干扰模型是一种软件程序,用来描述应用对系统资源(15种)产生的干扰和应用自身能承受的干扰。如表1所示:
表1.应用资源干扰模型形式
Figure PCTCN2018077191-appb-000001
如表1所示,T1_C_T表示TaskT_1能承受的资源Cpu的干扰值,T1_C_C表示TaskT_1对资源Cpu的干扰值,表1中仅列出了部分资源。
具体而言,单独运行第一任务T_1获取其某一性能指标,例如单独运行web应用服务器,获取中央处理器的每秒计算率指标;单独运行基础干扰程序SOI中的中央处理器干扰程序,获取中央处理器的每秒计算率指标;同时运行第一任务T_1和基础干扰程序SOI中的中央处理器干扰程序,并调节干扰强度,使得T_1的每秒计算率性能降到原性能指标的95%(95%为设定经验值),此时基础干扰程序SOI的干扰强度为T_1所能承受的中央处理器的干扰,同时检测这个过程中基础干扰程序的SOI的干扰强度的变化,该过程中基础干扰程序的SOI的干扰强度的变化为web应用对中央处理器产生的干扰强度。
可选地,该中央控制设备根据该多个运行设备中每个运行设备的资源信息参数值和/或该多个运行设备中每个运行设备的干扰信息与多个分区取值范围的关系将该每个运行设备划分到所对应的分区。
例如,可以根据该运行设备的中央处理器的使用率划分分区,当该运行设备的中央处理器使用率小于30%划分分区1;当该运行设备的中央处理器使用率大于或等于30%,小于或等于60%划分分区2;当该运行设备的中央处理器使用率大于60%划分分区3。
又例如根据固态硬盘存储率划分分区,当该运行设备的固态硬盘存储率小于30%划分分区1;当该运行设备的固态硬盘存储率大于或等于30%,小于或等于70%划分分区2;当该运行设备的固态硬盘存储率大于70%划分分区3。
可选地,在本申请实施例中,还可以利用加权平均法,对中央处理器资源、磁盘阵列资源、固态硬盘存储资源、网络资源、异构加速资源和干扰信息进行加权平均得到加权平均数来划分该运行设备的分区。
例如,根据中央处理器的使用率和固态硬盘存储率的参数共同划分分区,具体地,可以利用加权平均法计算该中央处理器的使用率和固态硬盘存储率的多个加权平均数,该多个加权平均数划分分区。
应理解,本申请实施例仅以资源信息包括上述五类信息和干扰信息为例进行说明,但本申请实施例并不限于此,资源信息还可以包括其它信息。
应理解,该中央控制设备可以根据该多个运行设备中每个运行设备的资源信息参数值与多个分区取值范围的关系或根据该多个运行设备中每个运行设备的干扰信息与多个分区取值范围的关系,将该每个运行设备划分到所对应的分区;该中央控制设备还可以将每个运行设备的资源信息参数或每个运行设备的干扰信息发送给具有划分分区功能的设备,该具有划分分区功能的设备根据该多个运行设备中每个运行设备的资源信息参数值与多个分区取值范围的关系或根据该多个运行设备中每个运行设备的干扰信息与多个分区取值范围的关系,将该每个运行设备划分到所对应的分区,该中央控制设备接收该具有分区功能的设备发送的多个运行设备的分区结果。
可选地,中央控制设备可以为集群内的每个分区分配控制设备,该控制设备用于管理该分区,该控制设备可以是该分区内的运行设备,该控制设备也可以不是该分区的运行设备。
应理解,该中央控制设备可以属于该集群,也可以不属于该集群。
可选地,中央控制设备将该第一任务封装为该测试任务,其中,封装该第一任务是指 将该第一任务压缩打包,用于运行设备测试当该运行设备执行该第一任务时的性能,而不是由该运行设备直接运行该第一任务。
在202中,该控制设备接收该中央控制设备发送的该测试任务。
例如,如图2所示,控制设备1和控制设备2分别接收该中央控制设备发送的该测试任务。
在203中,该控制设备选择该分区中的至少一个运行设备进行该测试任务的测试。
可选地,控制设备可以在其管理的分区内选择的多个运行设备进行该封装的任务的性能测试,控制设备也可以控制设备在其管理的分区内选择的任意一个运行设备进行该封装的任务的性能测试。
在204中,该控制设备向该控制设备控制的分区中的至少一个运行设备发送该测试任务。
在205中,该至少一个运行设备接收该测试任务。
在206中,该至少一个运行设备测试该测试任务,用于获取测试的该测试任务的测试结果。
例如,如图2所示,控制设备1选择运行设备11测试该测试任务,控制设备2选择运行设备21测试该任务。应理解,控制设备1和控制设备2还可以选择其控制的分区内的多个运行设备测试该测试任务。
可选地,针对该第一任务的性能测试结果包括:该第一任务测试时的运行状态参数值、该第一任务能够承受的最大干扰强度和该第一任务对该运行设备产生的干扰强度中的至少一种;则可以根据该第一任务测试时的运行状态参数值、该第一任务能够承受的最大干扰强度或该第一任务对该运行设备产生的干扰强度中的至少一种参数,从该集群内的多个分区中选择用于执行该第一任务的第一分区。
可选地,该第一任务测试时的运行状态参数包括以下一个或多个信息,如时延、每秒查询率、响应时间和吞吐率等指标。
在207中,该至少一个运行设备向该控制设备发送该测结果,用于该控制设备向该中央控制设备发送该测试结果。
在208中,该控制设备接收该至少一个运行设备发送的该测试任务的测试结果。
在209中,该控制设备向该中央控制设备发送该测试任务的测试结果。
在210中,该中央控制设备获取该多个分区的控制设备发送的该测试任务的测试结果。
在211中,该中央控制设备根据该测试结果,从集群内的多个分区中选择用于执行该第一任务的第一分区。
例如,如图2所示,该中央控制设备根据控制设备1和控制设备2发送的测试结果,选择控制设备1的分区为第一分区。
可选地,该中央控制设备选择对执行该第一任务最重要的一个或多个运行状态参数作为选择执行该第一任务的第一分区的指标。
具体地,该中央控制设备比较运行状态参数值,例如,比较时延、每秒查询率、响应时间、吞吐率、任务能够承受的最大干扰强度和该第一任务对该运行设备产生的干扰强度等参数,选择影响该第一任务执行的相关参数作为选择执行该第一任务的第一分区的指 标,例如,当该第一任务要求响应时间最快时,可以选择响应时间最快的分区作为执行该第一任务的第一分区。
例如,根据该第一任务测试时的运行状态参数值,从多个分区中选择执行第一任务的分区。在图3中,根据每个分区获得的该第一任务运行所需占用的一个资源使用信息从多个分区中选择执行第一任务的分区。分区1的每秒查询率大于分区2的每秒查询率,还大于分区3的每秒查询率,因此,选择每秒查询率最高的分区2作为该第一任务的第一分区。
可选地,该中央控制设备根据该第一任务能够承受的最大干扰强度与每个分区的干扰强度的匹配度,选择执行该第一任务的第一分区。
例如,根据任务能够承受的最大干扰强度与每个分区的干扰强度,从多个分区中选择执行第一任务的第一分区。在图3中,分区1的干扰强度为区间1,分区2的干扰强度为区间2,分区3的干扰强度为区间3,区间1大于区间2,区间2大于区间3,当该第一任务能承受的干扰强度在区间2内时,可以选择分区2或分区3作为该第一任务的第一分区,但是分区3对该第一任务的干扰最小,应选择分区3作为该第一任务的第一分区。
例如,分区1的干扰强度为区间1,该第一任务在分区1能够承受的干扰强度小于分区1的干扰强度为区间1,所以该第一任务不适合在分区1内执行;分区2的干扰强度为区间2,该第一任务在分区2能够承受的干扰强度小于分区2的干扰强度为区间2,所以该第一任务不适合在分区2内执行;分区3的干扰强度为区间3,该第一任务在分区3能够承受的干扰强度大于分区3的干扰强度为区间3,所以选择分区3作为该第一任务的第一分区。
可选地,该中央控制设备根据该第一任务对运行设备产生的干扰强度和每个分区的干扰强度,选择执行该第一任务的第一分区。该第一任务对运行设备产生的干扰强度指该第一任务在该运行设备上执行时对该运行设备的资源产生的干扰。
例如,分区1的干扰强度为区间1,该第一任务在分区1产生的干扰强度为11,分区2的干扰强度为区间2,该第一任务在分区2产生的干扰强度为22,分区3的干扰强度为区间3,该第一任务在分区3产生的干扰强度为33,比较该第一任务对3个分区的干扰强度的影响,选择执行该第一任务后分区干扰强度小的分区作为该第一任务的第一分区。
可选地,在本申请实施例中,还可以利用加权平均法,对运行状态参数值,如时延、每秒查询率、响应时间、吞吐率、该任务能够承受的最大干扰强度和该任务对运行设备产生的干扰强度进行加权平均得到加权平均数来选择该运行设备的第一分区。
可选地,该中央控制设备还可以根据任务类型选择分区。具体地,当该第一任务为长任务时,中央控制设备可以选择中央处理器使用率较高的分区;当该第一任务为批任务时,中央控制设备可以选择中央处理器使用率较低的分区。
在212中,该中央控制设备向该第一分区的控制设备发送该第一任务。
例如,在图2中,该中央控制设备向控制设备1发送该第一任务。
在213中,控制设备接收该中央控制设备发送的第一任务。
例如,在图2中,该控制设备1接收中央控制设备发送的该第一任务。
在214中,该控制设备选择该第一分区内执行该第一任务的目标运行设备。
例如,在图2中,该控制设备选择运行设备11为该第一任务的目标运行设备
可选地,该控制设备根据对该第一任务进行性能测试的至少一个运行设备的测试结 果,从对该第一任务进行性能测试的至少一个运行设备中,选择该目标运行设备。
可选地,该控制设备根据该第一任务的类型,选择影响该第一任务执行的相关参数作为选择执行该第一任务的运行设备的条件,例如,当该第一任务要求响应时间最快时,可以选择响应时间最快的运行设备作为执行该第一任务的目标运行设备。
可选地,在本申请实施例中,还可以利用加权平均法,对测试结果,如时延、每秒查询率、响应时间和吞吐率进行加权平均得到加权平均数来选择该第一任务的目标运行设备。
具体地,当该第一任务为批任务时,该控制设备从对该第一任务进行测试的至少一个运行设备中,利用加权平均法,对每秒查询和响应时间进行加权平均得到加权平均数来选择该第一任务的目标运行设备。
可选地,该控制设备可以在该分区内选择任意一个目标运行设备执行该第一任务。
应理解,该目标运行设备可以是在该分区内对该第一任务封装的测试任务进行测试的运行设备,也可以是在该分区内没有对该第一任务封装的测试任务进行测试的运行设备。
在215中,所述控制设备向所述目标运行设备发送该第一任务。
例如,如图2所示,该控制设备1向该运行设备11发送该第一任务。
在216中,所述目标运行设备接收所述控制设备发送的该第一任务。
在217中,所述目标运行设备执行所述第一任务。
因此,在本申请实施例中,中央控制设备通过该测试任务的测试结果从集群内的多个分区中选择用于执行该第一任务的分区,向控制该分区的控制设备发送该第一任务,从而,以分区为单位进行作业调度,对运行设备进行合理的调度,合理利用系统资源。
在现有的资源管理调度方案中,集中式资源管理调度方案中所有资源的管理和任务调度都在一个控制节点上,当集群规模扩大时,中央控制节点成为整个系统的瓶颈。本申请实施例的方法,通过中央控制设备通过从集群内的多个分区中选择用于执行该第一任务的分区,向控制该分区的控制设备发送该第一任务,从而,以分区为单位进行作业调度,对运行设备进行合理的调度,解决了集中式资源管理调度中当集群规模扩大时,中央控制节点成为整个系统的瓶颈的问题。
在现有的资源管理调度方案中,分层式共享调度方案中系统有两个调度器,这两个调度器共享系统的所有资源,两个调度器并发调度,容易产生调度资源冲突,当冲突次数越多,系统性能下降得越快。本申请实施例的方法,由中央控制设备将第一任务发送到分区控制设备,该分区控制设备选择第一任务的运行设备,不存在多个调度器带来的资源冲突问题,提高了系统性能。
应理解,当该运行设备执行该第一任务时,由于执行的任务可能占用了该运行设备的中央处理器或内存等资源,该任务可能对该运行设备的中央处理器、内存、网络等产生干扰,导致该运行设备的资源信息的参数值和干扰信息的强度不在第一分区的取值范围以内,所以中央控制设备可以根据该运行设备执行任务后的资源信息和/或干扰信息,重新确定该运行设备的第二分区。
该中央控制设备重新划分该运行设备的第二分区,如果该运行设备的第二分区与该运行设备的第一分区不同,应该将该运行设备的重新分配到第二分区下。
图4是根据本申请实施例的一种运行设备调度的方法300的示意性流程图。如图4所 示,该方法300包括以下内容。
在310中,当运行设备执行第一任务后,向中央控制设备发送该运行设备更新后的资源信息。
可选地,运行设备执行完成任务后,向该运行设备的第一分区的控制设备发送更新后的资源,该控制设备向该中央控制设备发送该更新后的资源。
可选地,该运行设备可以直接向中央控制设备发送该运行设备更新后的资源信息,该资源信息包括该运行设备执行该任务后的中央处理器资源、磁盘阵列资源、固态硬盘(Solid State Drives,SSD)存储资源、网络资源、异构加速资源和干扰信息等。该干扰信息主要包括运行设备中运行的任务所产生的干扰强度。
在320中,该中央控制设备接收该运行设备更新后的资源信息和干扰信息。
在330中,该中央控制设备根据该更新后的资源信息和多个分区取值范围的关系,确定该目标运行设备的第二分区。
可选地,当中央控制设备在该目标运行设备执行第一任务后确定的该目标运行设备的第二分区与第一分区为同一个分区时,保持该运行设备的分区。
可选地,当中央控制设备在该目标运行设备执行第一任务后确定的该目标运行设备的第二分区与第一分区为不同的分区时,中央控制设备将该目标运行设备所属的分区从第一分区更新为第二分区。
具体地,如图5所示,图5是本申请实施例的一种调度运行设备的方法的运行设备的动态分区示意框图。第一分区的运行设备3完成执行第一任务,中央控制设备根据运行设备3更新的资源信息或干扰信息对运行设备3重新划分分区。当中央设备确定运行设备3的分区为第二分区时,将运行设备3划分到第二分区,第一分区的控制设备1不再管理运行设备3,第二分区的控制设备2对运行设备3进行管理。
在340中,该中央控制设备向该第一分区的控制设备发送第一指示信息,该第一指示信息用于指示该第一分区的控制设备删除该运行设备的信息。
在350中,该第一分区的控制设备接收该第一指示信息。
在360中,该中央控制设备向该运行设备发送第二指示信息,该第二指示信息用于指示该目标运行设备将该运行设备将所属分区从该第一分区更改为该第二分区。
可选的,该控制设备删除该运行设备的信息后,向该中央控制设备发送第三指示信息,该第三指示信息用于指示控制设备已将该运行设备的资源信息删除;该中央控制设备收到第三指示信息后,向该运行设备发送第二指示信息,该第二指示信息用于指示该目标运行设备将该运行设备将所属分区从该第一分区更改为该第二分区。
在370中,该运行设备接收该第二指示信息。
在380中,该运行设备向该第二分区的控制设备发送该运行设备更新的资源信息。
在390中,该第二分区的控制设备接收该运行设备更新的资源信息。
在重新划分该运行设备的分区后,本发明实施例还包括上述图2和图3的方法的相应流程,为了简洁,在此不再赘述。
因此,在本申请的实施例中,该运行设备在执行该第一任务后,该运行设备向中央控制设备发送该运行设备更新后的资源信息或干扰信息,该中央控制设备根据更新后的资源信息和/或干扰信息重新划分该运行设备所在的分区。从而,该中央控制设备可以实时探 知整个集群的资源使用情况,合理利用系统资源。
在现有的资源管理调度方案中,分层式调度方案中央控制器仅负责管理集群资源,将集群资源分配给计算框架,各个计算框架根据分配好的资源进行任务调度,各个计算框架无法探知整个集群的实时资源使用情况。本申请实施例的方法,在该目标运行设备执行该任务完成后,向中央控制设备发送该运行设备更新后的资源信息,从而,该中央设备可以实时探知整个集群的资源使用情况。该中央控制设备根据更新后的资源信息或干扰信息重新划分该运行设备所在的分区,对运行设备进行合理的调度,合理利用系统资源实现了分区管理系统资源。
图6是根据本申请实施例的中央控制设备400的示意框图。如图6所示,该中央控制设备400包括:
发送模块410,用于向集群内的多个分区的控制设备发送测试任务,该测试任务为第一任务的测试任务,该多个分区中每个分区包括至少一个运行设备;
获取模块420,用于接收该多个分区的控制设备发送的该测试任务的测试结果;
选择模块430,用于根据该测试结果,从集群内的多个分区中选择用于执行第一任务的第一分区,该第一任务为执行的任务;
该发送模块410还用于向该第一分区的控制设备发送该第一任务,以便于该控制设备从该第一分区中选择执行该第一任务的目标运行设备。
可选地,该选择模块430具体用于:将该第一任务封装为该测试任务;向该每个分区的控制设备发送该测试任务;该测试结果具体包括针对该封装的该待执行任务的测试结果。
可选地,该测试结果包括:该第一任务测试时的运行状态参数值、该第一任务能够承受的最大干扰强度和该第一任务对该运行设备产生的干扰强度中的至少一种;
该选择模块430具体用于:
根据第一任务测试时的运行状态参数值、该第一任务能够承受的最大干扰强度和该第一任务对该运行设备产生的干扰强度中的至少一种,从该集群内的多个分区中选择用于执行该第一任务的第一分区。
可选地,如图7所示,该中央控制设备还包括划分模块440,该划分模块用于根据该获取模块410获取的该多个运行设备的资源信息和/或干扰信息,将该多个运行设备划分为该多个分区;
该划分模块440还用于为该每个分区分配控制设备。
可选地,该获取模块420还用于:
获取该集群内的多个运行设备的资源信息和/或干扰信息,其中,该资源信息用于指示该运行设备中能够使用的资源的情况,该干扰信息包括该运行设备上的任务对该运行设备产生的干扰强度。
可选地,该获取模块420还用于:当该目标运行设备完成该第一任务时,获取该目标运行设备的更新后的资源信息;
该划分模块440还用于根据该目标运行设备发送的更新后的资源信息和该集群内的多个运行设备的资源信息,将该目标运行设备所属分区从该第一分区更新为第二分区。
可选地,该划分模块440具体用于:根据该多个运行设备中每个运行设备的资源信息 参数值和/或干扰信息与多个取值范围的关系,将该每个运行设备划分到所对应的分区。
可选地,该获取模块420具体用于:获取用户通过多个计算框架中的第一计算框架的接口输入的该第一任务;该发送模块430具体用于:通过调用该第一计算框架向控制该第一分区的控制设备发送该第一任务。
图8是根据本申请实施例的控制设备500的示意框图。如图8所示,该控制设备500包括:
接收模块510,用于接收中央控制设备发送的测试任务,该控制设备用于管理集群中的多个分区中的第一分区内的至少一个运行设备,该测试任务为测试任务;
选择模块520,用于选择该分区中的至少一个运行设备进行该测试任务的性能测试;
该接收模块510还用于:接收该进行该测试任务的性能测试的运行设备发送的该测试任务的性能测试参数值;
该发送模块530,用于向该中央控制设备发送该测试任务的性能测试参数值,用于该中央控制设备确定该第一任务的第一分区;
该接收模块510还用于:接收中央控制设备发送的第一任务,该第一任务为执行的任务;
该选择模块520还用于:选择该第一分区内执行该第一任务的目标运行设备。
可选地,该控制设备还包括调度模块,其中,该接收模块510具体用于:接收中央控制设备发送的第一任务;该选择模块520具体用于:选择该第一分区内执行该第一任务的目标运行设备;该调度模块具体用于:调度该目标运行设备执行该第一任务。
可选地,该选择模块520具体用于:根据对该第一任务进行性能测试的至少一个运行设备的性能测试参数值,从对该第一任务进行性能测试的至少一个运行设备中,选择该目标运行设备。
可选地,该接收模块510具体用于:接收该中央控制设备发送的第一指示信息,该第一指示信息用于指示该控制设备删除该目标运行设备的信息;该发送模块还用于向该中央控制设备发送第二指示信息,该第二指示信息用于指示该控制设备已删除该目标运行设备的消息
图9是根据本申请实施例的运行设备600的示意框图。如图9所示,该运行设备600包括:
该接收模块610,用于接收该控制设备发送的测试任务;
该测试模块620,用于测试该测试任务,以便于获取测试的该测试任务的测试结果,该控制设备用于管理该运行设备所在的第一分区,该第一分区包括至少一个运行设备;
该发送模块630,用于向该控制设备发送该测试结果,用于该控制设备向该中央控制设备发送该测试结果,以便于该中央控制设备从多个分区选择执行第一任务的分区。
可选地,该运行设备还包括执行模块,其中,该接收模块还用于接收该控制设备发送的该第一任务;该执行模块,用于执行该第一任务。
可选地,该发送模块630还用于:在该运行设备执行控制设备发送的第一任务之前,向该中央控制设备或该运行设备所属的控制设备发送该运行设备的资源信息和/或干扰信息,用于中央控制设备对该运行设备进行分区的分配。
可选地,该接收模块610还用于接收该中央控制设备或该运行设备所属的控制设备发 送的第一指示信息,该第一指示信息用于指示该运行设备将该运行设备所属的分区从第一分区更新为第二分区;该发送模块还用于向该第二分区的控制设备发送该运行设备的资源信息。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
图10是根据本申请实施例的通信设备700的结构示意图。如图10所示,该通信设备700包括处理器710、存储器720和收发器730。该存储器720用于存储指令,该处理器710用于执行该存储器720存储的指令。处理器710可以控制该收发器730对外通信。处理器710、存储器720和收发器730之间通过内部连接通路互相通信,传递控制和/或数据信号。
可选地,该通信设备可以是中央控制设备。当该通信设备700是中央控制设备时,该通信设备700中的处理器710可以调用存储器720中的指令实现图2至图5中的各个方法的中央控制设备所执行的相应流程,为了简洁,在此不再赘述。
可选地,该通信设备也可以是控制设备。当该通信设备700是控制设备时,该通信设备700中的处理器710可以调用存储器720中的指令实现图2至图5中的各个方法的控制设备所执行的相应流程,为了简洁,在此不再赘述。
可选地,该通信设备还可以是运行设备。当该通信设备700是控制设备时,该通信设备700中的处理器710可以调用存储器720中的指令实现图2至图5中的各个方法的运行设备所执行的相应流程,为了简洁,在此不再赘述。
在本申请实施例中,处理器可以是中央处理器(Central Processing Unit,CPU),网络处理器(Network Processor,NP)或者CPU和NP的组合。处理器还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(Application-Specific Integrated Circuit,ASIC),可编程逻辑器件(Programmable Logic Device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD),现场可编程逻辑门阵列(Field-Programmable Gate Array,FPGA),通用阵列逻辑(Generic Array Logic,GAL)或其任意组合。
该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。
本申请实施例提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行上述图2至图10中本申请实施例的通信方法。该可读介质可以是ROM或RAM,本申请实施例对此不做限制。
应理解,本文中术语“和/或”以及“A或B中的至少一种”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或” 的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (33)

  1. 一种调度运行设备的方法,其特征在于,包括:
    向集群内的多个分区的控制设备发送测试任务,所述多个分区中每个分区包括至少一个运行设备;
    获取所述多个分区的控制设备发送的所述测试任务的测试结果;
    根据所述测试结果,从所述集群内的多个分区中选择用于执行所述第一任务的第一分区;
    向所述第一分区的控制设备发送所述第一任务,以便于所述控制设备从所述第一分区中选择执行所述第一任务的目标运行设备。
  2. 根据权利要求1所述的方法,其特征在于,所述向集群内的多个分区的控制设备发送测试任务之前,所述方法还包括:
    将所述第一任务封装为所述测试任务;
    向所述多个分区的控制设备发送所述测试任务。
  3. 根据权利要求2所述的方法,其特征在于,所述测试结果包括:所述第一任务测试时的运行状态参数值、所述第一任务能够承受的最大干扰强度和所述第一任务对所述运行设备产生的干扰强度中的至少一种。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,在所述向集群内的多个分区的控制设备发送测试任务之前,所述方法还包括:
    获取所述集群内的多个运行设备的资源信息和/或干扰信息,其中,所述资源信息用于指示所述运行设备中能够使用的资源的情况,所述干扰信息包括所述运行设备上的任务对所述运行设备产生的干扰强度;
    根据所述多个运行设备的资源信息和/或干扰信息,将所述多个运行设备划分为所述多个分区;
    为所述每个分区分配控制设备。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述多个运行设备的资源信息和/或干扰信息,将所述多个运行设备划分为所述多个分区划分,包括:
    根据所述多个运行设备中每个运行设备的资源信息所包括的参数值和/或干扰信息所包括的强度参数值与所述多个分区取值范围的对应关系,将所述每个运行设备划分到所对应的分区。
  6. 根据权利要求5所述的方法,其特征在于,当所述目标运行设备完成任务时,所述方法还包括:
    获取所述目标运行设备的更新后的资源信息和/或干扰信息;
    根据所述目标运行设备发送的更新后的资源信息和/或干扰信息与多个分区取值范围的关系,将所述目标运行设备所属分区从所述第一分区更新为第二分区。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述向集群内的多个分区的控制设备发送测试任务之前,所述方法还包括:
    获取用户通过多个计算框架中的第一计算框架的接口输入的所述第一任务。
  8. 一种调度运行设备的方法,其特征在于,包括:
    控制设备接收中央控制设备发送的测试任务,所述控制设备用于管理集群中的多个分区中的第一分区内的至少一个运行设备;
    选择所述第一分区中的至少一个运行设备进行所述测试任务的测试;
    向所述第一分区中的至少一个运行设备发送所述测试任务;
    接收所述进行所述测试任务的性能测试的运行设备发送的所述测试任务的测试结果;
    向所述中央控制设备发送所述测试任务的测试结果,用于所述中央控制设备确定所述第一任务的第一分区。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    接收中央控制设备发送的第一任务;
    选择所述第一分区内执行所述第一任务的目标运行设备;
    调度所述目标运行设备执行所述第一任务。
  10. 根据权利要求9所述的方法,其特征在于,所述选择所述第一分区内执行所述第一任务的目标运行设备,包括:
    根据对所述测试任务进行测试的至少一个运行设备的测试结果,从对所述测试任务进行测试的至少一个运行设备中,选择所述目标运行设备。
  11. 一种调度运行设备的方法,其特征在于,包括:
    运行设备接收控制设备发送的测试任务,所述控制设备用于管理所述运行设备所在的第一分区,所述第一分区包括至少一个运行设备;
    测试所述测试任务,用于获取测试的所述测试任务的测试结果;
    向所述控制设备发送所述测试结果,用于所述控制设备向所述中央控制设备发送所述测试结果,以便于所述中央控制设备从多个分区选择执行第一任务的分区。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    接收所述控制设备发送的所述第一任务;
    执行所述第一任务。
  13. 根据权利要求12所述的方法,其特征在于,在所述执行所述第一任务之后,所述方法还包括:
    向中央控制设备或所述运行设备所属的控制设备发送所述运行设备更新后的资源信息和/或干扰信息。
  14. 根据权利要求11至13中任一项所述的方法,其特征在于,在所述测试所述控制设备发送的测试任务之前,所述方法还包括:
    向所述中央控制设备或所述运行设备所属的控制设备发送所述运行设备的资源信息和/或干扰信息。
  15. 一种中央控制设备,其特征在于,包括发送模块、获取模块和选择模块:
    所述发送模块,用于向集群内的多个分区的控制设备发送测试任务,所述多个分区中每个分区包括至少一个运行设备;
    所述获取模块,用于获取所述多个分区的控制设备发送的所述测试任务的测试结果;
    所述选择模块,用于根据所述测试结果,从集群内的多个分区中选择用于执行所述第一任务的第一分区;
    所述发送模块还用于:
    向所述第一分区的控制设备发送所述第一任务,用于所述控制设备从所述第一分区中选择执行所述第一任务的目标运行设备。
  16. 根据权利要求15所述的中央控制设备,其特征在于,所述选择模块具体用于:
    将所述第一任务封装为所述测试任务,;
    向所述每个分区的控制设备发送所述测试任务。
  17. 根据权利要求15或16所述的中央控制设备,其特征在于,所述测试结果包括:所述第一任务测试时的运行状态参数值、所述第一任务能够承受的最大干扰强度和所述第一任务对所述运行设备产生的干扰强度中的至少一种。
  18. 根据权利要求15至17中任一项所述的中央控制设备,其特征在于,所述获取模块还用于:
    获取所述集群内的多个运行设备的资源信息和/或干扰信息,其中,所述资源信息用于指示所述运行设备中能够使用的资源的情况,所述干扰信息包括所述运行设备上的任务对所述运行设备产生的干扰强度;
    根据所述多个运行设备的资源信息和/或干扰信息,将所述多个运行设备划分为所述多个分区;
    所述中央控制设备还包括划分模块,所述划分模块用于根据所述多个运行设备的资源信息和/或干扰信息,将所述多个运行设备划分为所述多个分区;
    所述划分模块还用于为所述每个分区分配控制设备。
  19. 根据权利要求18所述的中央控制设备,其特征在于,所述获取模块还用于:当所述目标运行设备完成所述第一任务时,获取所述目标运行设备的更新后的资源信息;
    所述划分模块还用于根据所述目标运行设备发送的更新后的资源信息和所述集群内的多个运行设备的资源信息,将所述目标运行设备所属分区从所述第一分区更新为第二分区。
  20. 根据权利要求18或19所述的中央控制设备,其特征在于,所述划分模块具体用于:
    根据所述多个运行设备中每个运行设备的资源信息参数值和/或干扰信息与多个取值范围的关系,将所述每个运行设备划分到所对应的分区。
  21. 根据权利要求15至20中任一项所述的中央控制设备,其特征在于,所述获取模块具体用于:
    获取用户通过多个计算框架中的第一计算框架的接口输入的所述第一任务。
  22. 一种控制设备,其特征在于,包括接收模块、选择模块、发送模块和调度模块:
    所述接收模块,用于接收中央控制设备发送的测试任务,所述控制设备用于管理集群中的多个分区中的第一分区内的至少一个运行设备;
    所述选择模块,用于选择所述第一分区中的至少一个运行设备进行所述测试任务的测试;
    所述发送模块,用于向所述第一分区中的至少一个运行设备发送所述测试任务;
    所述接收模块还用于:接收所述进行所述测试任务的性能测试的运行设备发送的所述测试任务的测试结果;
    所述发送模块还用于:向所述中央控制设备发送所述测试任务的测试结果,用于所述中央控制设备确定所述第一任务的第一分区。
  23. 根据权利要求22所述的控制设备,其特征在于,
    所述接收模块具体用于:接收中央控制设备发送的第一任务;
    所述选择模块具体用于:选择所述第一分区内执行所述第一任务的目标运行设备;
    所述调度模块具体用于:调度所述目标运行设备执行所述第一任务。
  24. 根据权利要求22或23所述的控制设备,其特征在于,所述选择模块具体用于:
    根据对所述测试任务进行性能测试的至少一个运行设备的测试结果,从对所述测试任务进行性能测试的至少一个运行设备中,选择所述目标运行设备。
  25. 一种运行设备,其特征在于,包括接收模块、测试模块和发送模块:
    所述接收模块,用于接收控制设备发送的测试任务,所述控制设备用于管理所述运行设备所在的第一分区,所述第一分区包括至少一个运行设备;
    所述测试模块,用于测试所述测试任务,以便于获取测试的所述测试任务的测试结果;
    所述发送模块,用于向所述控制设备发送所述测试结果,用于所述控制设备向所述中央控制设备发送所述测试结果,以便于所述中央控制设备从多个分区选择执行第一任务的分区。
  26. 根据权利要求25所述的运行设备,其特征在于,所述运行设备还包括执行模块:
    所述接收模块还用于接收所述控制设备发送的所述第一任务;
    所述执行模块,用于执行所述第一任务。
  27. 根据权利要求26所述的运行设备,其特征在于,所述发送模块具体用于:
    在所述执行所述第一任务之后,向中央控制设备或所述运行设备所属的控制设备发送所述运行设备更新后的资源信息和/或干扰信息。
  28. 根据权利要求25至27中任一项所述的运行设备,其特征在于,所述发送模块具体用于:
    在所述测试所述控制设备发送的测试任务之前,向所述中央控制设备或所述运行设备所属的控制设备发送所述运行设备的资源信息和/或干扰信息。
  29. 一种集群通信的系统,其特征在于,所述系统包括中央控制设备、多个控制设备和多个运行设备,其中,所述多个运行设备被划分为多个分区,所述多个分区中的每个分区包括至少一个运行设备,所述多个控制设备中的每个控制设备分别控制所述多个分区中的一个分区;
    所述中央控制设备用于:向所述多个控制设备发送测试任务;以及,接收所述多个控制设备发送的测试结果,根据所述测试结果,从所述多个分区中选择用于执行所述第一任务的第一分区,并向所述第一分区的控制设备发送所述第一任务;
    所述控制设备用于:接收所述中央控制设备发送的所述测试任务,并向所控制的分区中的至少部分运行设备发送所述测试任务;接收所控制的所述至少部分运行设备发送的所述测试结果,向所述中央控制设备发送所述测试结果;以及,接收所述中央控制设备发送的所述第一任务,选择所控制的分区内执行所述第一任务的运行设备,以及调度所控制的分区内的所述运行设备执行所述第一任务;
    所述运行设备用于:接收控制设备发送的所述测试任务,测试所述测试任务,用于 获取所述测试任务的测试结果,并向各自的控制设备发送所述测结果;以及,根据控制设备的调度,执行所述第一任务。
  30. 根据权利要求29所述的系统,其特征在于,所述中央控制设备还用于:将所述第一任务封装为所述测试任务。
  31. 根据权利要求29或30所述的系统,其特征在于,所述测试结果包括:所述第一任务测试时的运行状态参数值、所述第一任务能够承受的最大干扰强度和所述第一任务对所述运行设备产生的干扰强度中的至少一种。
  32. 根据权利要求29至31中任一项所述的系统,其特征在于,所述中央控制设备还用:获取所述多个运行设备的资源信息和/或干扰信息,其中,所述资源信息用于指示所述运行设备中能够使用的资源的情况,所述干扰信息包括所述运行设备上的任务对所述运行设备产生的干扰强度;
    根据所述多个运行设备的资源信息和/或干扰信息,将所述多个运行设备划分为所述多个分区;
    从所述多个控制设备中,为所述每个分区分配控制设备。
  33. 根据权利要求32所述的系统,其特征在于,
    所述运行设备还用于:向中央控制设备或所述运行设备所属的控制设备发送所述运行设备更新后的资源信息和/或干扰信息;
    所述控制设备还用于:接收所述运行设备发送的所述运行设备更新后的资源信息和/或干扰信息,以及向所述中央控制设备发送所述运行设备更新后的资源信息和/或干扰信息;
    所述中央控制设备还用于:接收所述运行设备或所述运行设备所属的控制设备发送的所述运行设备更新后的资源信息和/或干扰信息;根据所述更新后的资源信息和/或干扰信息与多个分区取值范围的关系,更新所述运行设备的所属分区。
PCT/CN2018/077191 2017-02-28 2018-02-26 调度运行设备的方法、设备和运行设备 WO2018157768A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710112027.3A CN108509256B (zh) 2017-02-28 2017-02-28 调度运行设备的方法、设备和运行设备
CN201710112027.3 2017-02-28

Publications (1)

Publication Number Publication Date
WO2018157768A1 true WO2018157768A1 (zh) 2018-09-07

Family

ID=63369767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077191 WO2018157768A1 (zh) 2017-02-28 2018-02-26 调度运行设备的方法、设备和运行设备

Country Status (2)

Country Link
CN (1) CN108509256B (zh)
WO (1) WO2018157768A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380108A (zh) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 一种面向分区空间隔离的全自动测试方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109343947A (zh) * 2018-09-26 2019-02-15 郑州云海信息技术有限公司 一种资源调度方法及装置
CN109992506B (zh) * 2019-03-18 2024-05-31 平安科技(深圳)有限公司 调度测试方法、装置、计算机设备和存储介质
CN110196774A (zh) * 2019-05-06 2019-09-03 平安科技(深圳)有限公司 对不同数据服务器测试的调度方法及相关装置
CN112416538B (zh) * 2019-08-20 2024-05-07 中国科学院深圳先进技术研究院 一种分布式资源管理框架的多层次架构和管理方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110313581A1 (en) * 2010-06-18 2011-12-22 General Electric Company Self-healing power grid and method thereof
CN102866950A (zh) * 2012-09-13 2013-01-09 浪潮(北京)电子信息产业有限公司 一种虚拟服务器的性能测试方法以及测试工具
CN104407910A (zh) * 2014-10-29 2015-03-11 华南理工大学 一种虚拟化服务器性能的监测方法及系统
CN105117289A (zh) * 2015-09-30 2015-12-02 北京奇虎科技有限公司 基于云测试平台的任务分配方法、装置及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8122441B2 (en) * 2008-06-24 2012-02-21 International Business Machines Corporation Sharing compiler optimizations in a multi-node system
CN102902592B (zh) * 2012-09-10 2016-04-20 曙光信息产业(北京)有限公司 一种集群计算资源的分区调度管理方法
CN103257896B (zh) * 2013-01-31 2016-09-28 南京理工大学连云港研究院 一种云环境下的Max-D作业调度方法
CN105868008B (zh) * 2016-03-23 2019-05-28 深圳大学 基于关键资源和数据预处理的资源调度方法及识别系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110313581A1 (en) * 2010-06-18 2011-12-22 General Electric Company Self-healing power grid and method thereof
CN102866950A (zh) * 2012-09-13 2013-01-09 浪潮(北京)电子信息产业有限公司 一种虚拟服务器的性能测试方法以及测试工具
CN104407910A (zh) * 2014-10-29 2015-03-11 华南理工大学 一种虚拟化服务器性能的监测方法及系统
CN105117289A (zh) * 2015-09-30 2015-12-02 北京奇虎科技有限公司 基于云测试平台的任务分配方法、装置及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380108A (zh) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 一种面向分区空间隔离的全自动测试方法
CN112380108B (zh) * 2020-07-10 2023-03-14 中国航空工业集团公司西安飞行自动控制研究所 一种面向分区空间隔离的全自动测试方法

Also Published As

Publication number Publication date
CN108509256A (zh) 2018-09-07
CN108509256B (zh) 2021-01-15

Similar Documents

Publication Publication Date Title
WO2018157768A1 (zh) 调度运行设备的方法、设备和运行设备
CN113243005B (zh) 按需网络代码执行系统中的基于性能的硬件仿真
US10635664B2 (en) Map-reduce job virtualization
US9485197B2 (en) Task scheduling using virtual clusters
CN110764912B (zh) 一种自适应任务调度器及方法
Kliazovich et al. CA-DAG: Modeling communication-aware applications for scheduling in cloud computing
US10652319B2 (en) Method and system for forming compute clusters using block chains
US8949847B2 (en) Apparatus and method for managing resources in cluster computing environment
KR101583325B1 (ko) 가상 패킷을 처리하는 네트워크 인터페이스 장치 및 그 방법
CN106933669B (zh) 用于数据处理的装置和方法
KR101794696B1 (ko) 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템
US11320998B2 (en) Method for assuring quality of service in distributed storage system, control node, and system
CN110383764B (zh) 无服务器系统中使用历史数据处理事件的系统和方法
US20100274890A1 (en) Methods and apparatus to get feedback information in virtual environment for server load balancing
WO2016138638A1 (zh) 虚拟机的资源分配方法及装置
WO2020113310A1 (en) System and method for resource partitioning in distributed computing
JP2015056182A5 (zh)
US9772792B1 (en) Coordinated resource allocation between container groups and storage groups
US10027596B1 (en) Hierarchical mapping of applications, services and resources for enhanced orchestration in converged infrastructure
CN109726005A (zh) 用于管理资源的方法、服务器系统和计算机程序产品
US20190028375A1 (en) Prioritized client-server communications based on server health
US10853137B2 (en) Efficient resource allocation for concurrent graph workloads
KR101656706B1 (ko) 고성능 컴퓨팅 환경에서의 작업 분배 시스템 및 방법
US20190171489A1 (en) Method of managing dedicated processing resources, server system and computer program product
US20230037293A1 (en) Systems and methods of hybrid centralized distributive scheduling on shared physical hosts

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18761655

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18761655

Country of ref document: EP

Kind code of ref document: A1