WO2018076238A1 - 异构系统、计算任务分配方法及装置 - Google Patents

异构系统、计算任务分配方法及装置 Download PDF

Info

Publication number
WO2018076238A1
WO2018076238A1 PCT/CN2016/103585 CN2016103585W WO2018076238A1 WO 2018076238 A1 WO2018076238 A1 WO 2018076238A1 CN 2016103585 W CN2016103585 W CN 2016103585W WO 2018076238 A1 WO2018076238 A1 WO 2018076238A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
module
task
computing module
occupancy rate
Prior art date
Application number
PCT/CN2016/103585
Other languages
English (en)
French (fr)
Inventor
黄勤业
陈云
罗会斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201680056714.1A priority Critical patent/CN108604193A/zh
Priority to PCT/CN2016/103585 priority patent/WO2018076238A1/zh
Publication of WO2018076238A1 publication Critical patent/WO2018076238A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the embodiments of the present invention relate to the field of data processing, and in particular, to a heterogeneous system, a computing task allocation method, and an apparatus.
  • Heterogeneous systems are computational systems that make up computing systems that use different types of instruction sets and architectures.
  • Common heterogeneous systems include: Central Processing Unit (CPU) type computing module, Graphics Processing Unit (GPU) type computing module, and Field Programmable Gate Array (FPGA) computing module. . Since each computing module has its own type of expertise when performing computational tasks, it takes less time to perform good computational tasks.
  • the heterogeneous system receives the computing task, the operating system in the heterogeneous system allocates the computing task to the corresponding computing module in the heterogeneous system according to the task type of the computing task.
  • the task type of the task A is a complex operation type
  • the task A is assigned to the CPU class calculation module for processing
  • the task type of the task B is a floating point type
  • the task B is assigned to the GPU class calculation module.
  • Processing; the task type of the calculation task C is a parallel operation type, and the calculation task C is allocated to the FPGA class calculation module for processing.
  • the present invention provides a heterogeneous system, a computing task allocation method and apparatus.
  • the technical solution is as follows:
  • an embodiment of the present invention provides a computing task allocation method. Because the heterogeneous system is In the process of assigning computing tasks, only the task types of computing tasks are considered, and the computational efficiency of the entire heterogeneous system may be reduced. In order to fully consider the resource utilization rate of each computing module, the method of assigning computing tasks is improved.
  • the computing task allocation method includes: determining, according to a task type of the computing task to be allocated, at least two of the n computing modules included in the heterogeneous system having the computing task a computing module of capability; predicting a time overhead of the computing task performed on each of the at least two computing modules; and obtaining a resource occupancy rate of each computing module; and calculating from at least two computing modules according to time overhead and resource occupancy Determining a target computing module; and assigning the computing task to a target computing module, the computing task being executed by the target computing module.
  • the present application predicts the time cost of the computing task on each computing module according to the task type of the computing task, and obtains the resource occupancy rate of each computing module.
  • the computing is considered simultaneously.
  • the time cost of the task in the calculation module and the resource usage rate in the calculation module help to solve the problem of only calculating the task type of the computing task in the process of allocating the computing task, which may reduce the computational efficiency of the entire heterogeneous system.
  • assigning computing tasks comprehensively consider the time cost and resource usage of each computing module, and improve the computational efficiency of heterogeneous systems as a whole.
  • the determining, by the time cost and the resource usage ratio, the target computing module from the at least two computing modules including: according to a time overhead of each computing module and each The resource occupancy of the computing module is calculated, and the weighted sum of each computing module is calculated; the computing module that does not exceed the predetermined threshold and has the minimum time overhead is determined as the target computing module.
  • the computing module is determined as the target computing module by the weighting sum and the computing module that does not exceed the predetermined threshold and has the smallest time overhead, and the minimum time overhead is preferentially considered on the premise of considering the resource usage rate of each computing module.
  • the calculation module is determined as the target calculation module, which is beneficial to give full play to the computing performance of the heterogeneous system.
  • the calculating module that weights the sum does not exceed the predetermined threshold and has the minimum time overhead is determined as the target calculation
  • the module includes: detecting whether a weighted sum of each computing module exceeds the predetermined threshold; and if there is a weighted sum of the at least one computing module not exceeding the predetermined threshold, determining a computing module having the minimum time overhead as the target computing module .
  • the calculating module that does not exceed the predetermined threshold and has the minimum time overhead is determined as the target
  • the calculating module includes: determining a first computing module having the smallest time overhead; detecting whether a weighted sum of the first computing module exceeds the predetermined threshold; and if the weighted sum of the first computing module does not exceed the predetermined threshold, The calculation module is determined as the target calculation module.
  • determining the target computing module from the at least two computing modules according to the time overhead and the resource usage including: time cost according to each computing module and resources of each computing module
  • the occupancy rate is calculated by calculating the weighted sum of each computing module; the computing module having the smallest weighted sum is determined as the target computing module.
  • the computing module with the smallest weighted sum is preferentially determined as the target computing module, which enables the computing task to be executed as soon as possible, thereby reducing the waiting time of the computing task and improving the computing efficiency of the heterogeneous system.
  • the weighted sum of each computing module is calculated according to the time overhead of each computing module and the resource occupancy rate of each computing module, including:
  • Y is the weighted sum of each computing module
  • ⁇ 1 is the resource occupancy rate of each computing module
  • k 1 is the weight corresponding to the resource occupancy rate
  • ⁇ 2 is the time overhead for each computing module to perform the computing task, k 2 The weight corresponding to the time overhead.
  • the at least two computing modules comprise a CPU class computing module, a GPU class At least two of the calculation module and the FPGA class calculation module.
  • the resource The occupancy rate includes the computing resource occupancy rate and/or the communication resource occupancy rate of the computing module.
  • the computing module includes a CPU class calculation
  • the module and the CPU-based computing module obtain the resource occupancy rate of the computing module through the on-chip network Noc, including: reading the cache occupancy rate of each on-chip router of the NoC, and the cache occupancy rate is used to represent the communication resource occupancy rate on the NoC, and each on-chip
  • the cache occupancy of the router is periodically calculated by the specified CPU on the NoC.
  • the total cache occupancy is obtained by summing the cache occupancy, and the total cache occupancy is determined as the NoC resource occupancy.
  • the computing module includes a GPU type computing module
  • the resource occupancy rate of the computing module is obtained, which includes: obtaining the device queue occupancy rate on the GPU-based computing module; determining the device queue occupancy rate as the resource occupancy rate of the GPU-based computing module.
  • the computing module comprises an FPGA class computing module Obtaining the resource occupancy rate of the computing module, including: when the computing resources used for calculating the computing task are located on the same FPGA, acquiring the resource occupancy rate on the FPGA, as the resource occupancy rate of the FPGA class computing module; when used for calculation and calculation When the computing resources of the task are located on different FPGAs, the resource occupancy rate of each FPGA and the transmission overhead between different FPGAs are obtained as the resource occupancy rate of the FPGA type computing module; when the computing resources used for computing computing tasks are different When the FPGA is located on a different server, the resource occupancy rate of each FPGA and the transmission overhead between different servers are obtained as the resource occupancy rate of the FPGA-based computing module.
  • an embodiment of the present invention provides a computing task allocation device, where the computing task allocation device includes at least one unit, and the at least one unit is configured to implement any one of the foregoing first aspect or the first aspect.
  • the calculation task assignment method provided.
  • an embodiment of the present invention provides a heterogeneous system, where the heterogeneous system includes a scheduling module, a memory, and n computing modules, where n is an integer greater than 1, and the scheduling module is configured to implement the foregoing first aspect or A computing task allocation method provided in any of the possible implementations in an aspect.
  • an embodiment of the present invention provides a computer readable storage medium, where the computing task provided by implementing the foregoing first aspect or any one of the first aspect may be stored.
  • An executable program that allocates methods.
  • FIG. 1 is a schematic structural diagram of an implementation environment of a computing task allocation method according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a heterogeneous system according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for calculating a task assignment according to an embodiment of the present invention
  • 4A is a flowchart of a method for calculating a task assignment according to an embodiment of the present invention
  • 4B is a flowchart of a method for calculating a task assignment according to another embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for computing a task allocation method according to another embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for calculating a task assignment according to another embodiment of the present invention.
  • FIG. 7A is a flowchart of a method for performing a part of steps of a computing task allocation method according to another embodiment of the present invention.
  • FIG. 7B is a schematic structural diagram of a NoC according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of an FPGA according to an embodiment of the present invention.
  • FIG. 8B is a schematic structural diagram of an FPGA according to another embodiment of the present invention.
  • FIG. 8C is a schematic structural diagram of an FPGA according to another embodiment of the present invention.
  • FIG. 9 is a flowchart of a method for a part of steps of a computing task allocation method according to an embodiment of the present invention.
  • FIG. 10 is a structural block diagram of a computing task allocation apparatus according to an embodiment of the present invention.
  • Multiple as referred to herein means two or more. "and / or”, describing the relationship of the associated object The association relationship indicates that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • FIG. 1 is a schematic structural diagram of an implementation environment of a computing task allocation method according to an embodiment of the present invention.
  • the implementation environment includes a database 110 , a database operation server 120 , and a client 130 .
  • Database 110 is used to store data.
  • the database operations server 120 is for processing data stored in the database 110.
  • the database operations server 120 employs heterogeneous systems for acceleration.
  • the database operations server 120 is a server or server cluster implemented in a heterogeneous system.
  • the client 130 is a device that sends a calculation task for data to the database operation server 120, and requests the database operation server 120 to process the calculation task, such as a mobile phone, a tablet computer, a personal computer, and the like.
  • computing tasks include: database operations such as data query operations, data sort operations, and data sum operations.
  • database operations such as data query operations, data sort operations, and data sum operations.
  • the specific types of computing tasks can be different in different implementation scenarios.
  • the database 110 is connected to the database operations server 120 via a network.
  • the database operations server 120 is connected to the client 130 via a wired network or a wireless network.
  • the wireless or wired network described above uses standard communication techniques and/or protocols.
  • the network is usually the Internet, but it can also be any network, including but not limited to a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN). , any combination of mobile, wired or wireless networks, private networks or virtual private networks).
  • techniques and/or formats including Hypertext Markup Language (HTML), Extensible Markup Language (XML), etc. are used to represent exchange over a network.
  • HTML Hypertext Markup Language
  • XML Extensible Markup Language
  • you can also use Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (English).
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • VPN Virtual Private Network
  • Internet Protocol Security English: Internet Protocol Security, IPsec) Encryption technology to encrypt all or some links.
  • the above described data communication techniques may also be replaced or supplemented using custom and/or dedicated data communication techniques.
  • FIG. 2 shows a schematic structural diagram of a heterogeneous system 200 provided by an exemplary embodiment of the present invention.
  • the heterogeneous system 200 includes a scheduling module 210, a memory 220, a network interface 230, a GPU class computing module 240, a CPU class computing module 250, and an FPGA class computing module 260.
  • the GPU class computing module 240, the CPU class computing module 250, and the FPGA class computing module 260 are three computing modules in the heterogeneous system 200.
  • the heterogeneous system 200 includes at least two computing modules of the GPU class computing module 240, the CPU class computing module 250, and the FPGA class computing module 260.
  • the scheduling module 210 can be implemented by a CPU or a GPU or an FPGA, and the scheduling module 210 is implemented by using a CPU.
  • the scheduling module 210 includes one or more processing cores.
  • the scheduling module 210 executes various functional applications and data processing by running software programs and modules. For example, determining, according to the task type of the computing task to be allocated, at least two computing modules having the capability of performing the computing task from the n computing modules; and calculating the computing task execution time on each of the at least two computing modules The cost is obtained, and the resource occupancy rate of each computing module is obtained; the target computing module is determined from at least two computing modules according to the time overhead and the resource occupancy; and the computing task is allocated to the target computing module.
  • the memory 220 is used to store software programs and modules.
  • the memory 220 can store the operating system 21, the application module 22 required for at least one function.
  • the operating system 21 can be an operating system such as Real Time eXecutive (RTX), LINUX, UNIX, WINDOWS, or OS X.
  • the application module 22 may include a determination module, a prediction module, an acquisition module, an allocation module, and the like.
  • the determining module is configured to determine, according to the task type of the computing task to be allocated, at least two computing modules having the capability of performing the computing task from the n computing modules; and the predicting module, configured to predict the computing task in at least two calculations
  • a calculation module an allocation module, which is further configured to allocate a calculation task to the target calculation module, and the target calculation module is configured to execute the calculation task.
  • memory 220 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory. (EEPROM), erasable programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory magnetic memory
  • flash memory magnetic or optical disk.
  • the network interface 230 can be multiple for acquiring data in the database 110 for the heterogeneous system 200, receiving computing tasks, and communicating with other devices.
  • the memory 220, the network interface 230, the GPU type calculation module 240, the CPU type calculation module 250, and the FPGA type calculation module 260 are respectively connected to the scheduling module 210.
  • the memory 220, the network interface 230, the GPU class computing module 240, the CPU class computing module 250, and the FPGA class computing module 260 are respectively connected to the scheduling module 210 through a bus; or
  • the memory 220, the network interface 230, the GPU class computing module 240, the CPU class computing module 250, and the FPGA class computing module 260 are respectively connected to the scheduling module 210 through a network.
  • heterogeneous system 200 structure illustrated in FIG. 2 does not constitute a definition of the heterogeneous system 200, may include more or fewer components than those illustrated, or may combine certain components, or Different parts are arranged.
  • the heterogeneous system 200 includes n kinds of calculation modules, and n is an integer greater than or equal to 2. That is, in some embodiments, heterogeneous system 200 can include two types of computing modules; in other embodiments, heterogeneous system 200 can include four types of computing modules.
  • FIG. 3 shows a flowchart of a computing task allocation method provided by an exemplary embodiment of the present invention.
  • the computing task allocation method is applied to the heterogeneous system 200 shown in FIG. 2, and the method includes:
  • Step 301 Determine, according to the task type of the computing task to be allocated, at least two computing modules having the capability of executing the computing task from the n computing modules.
  • the client When the user processes the data on the client, the client generates a corresponding computing task, and sends the computing task to the heterogeneous system, and the scheduling module in the heterogeneous system receives the computing task.
  • the computing task is a task of processing operations on data stored in the database, such as: querying data, sorting data, updating data, deleting data, filtering data, performing mathematical operations on the data, etc., wherein, mathematical operations Including summation, difference, quadrature, quotient, surplus, average, maximum, minimum.
  • the task types of the computing task include query, sort, mathematical operation, filtering, comparison, update, deletion, and the like.
  • the task type of the query operation is a data query.
  • the scheduling module determines at least two computing modules having the capability to perform computing tasks from the n computing modules of the heterogeneous system.
  • the CPU class computing module and the GPU class computing module have the ability to perform the task task of the task type A; for example, the CPU class computing module, the GPU class computing module, and the FPGA class computing module have the ability to perform the task task of the task type B.
  • some task types can only be executed by one computing module.
  • the computing task of task type C is only suitable for execution by the CPU class computing module. This embodiment does not discuss the computing tasks of these task types.
  • Step 302 predicting a time overhead of the computing task on each of the computing modules on the at least two computing modules.
  • each computing module For each computing module that has the ability to perform this computing task, each computing module has a different time overhead when performing a computing task of a certain task type.
  • the time overhead is used to characterize how long it takes for the computing module to perform computing tasks.
  • the scheduling module predicts a time overhead of the computing task on each of the at least two computing modules according to the task type of the computing task.
  • the computing module has the capability of performing the computing task, and includes at least two of a CPU class computing module, an FPGA class computing module, and a GPU class computing module.
  • the correspondence between the task type, the calculation module type, and the time cost is stored in the heterogeneous system.
  • Task type Calculation module type Time overhead Type A CPU 0.01 seconds Type A GPU 0.20 seconds Type A FPGA 0.04 seconds Type B CPU 0.03 seconds Type B GPU 0.14 seconds Type B FPGA 0.05 seconds
  • Table 1 schematically shows the correspondence between the task type, the calculation module type, and the time overhead.
  • the scheduling module calculates the time overhead of the computing task on each computing module according to the preset correspondence relationship Make predictions.
  • the scheduling module queries the time cost of the computing task on each computing module according to the task type of the computing task in the preset correspondence.
  • the preset correspondence stores the correspondence between the task type, the calculation module type, and the time overhead.
  • the scheduling module in the heterogeneous system predicts that the time cost of the computing task is larger, and the time spent calculating the task is more; the scheduling module predicts that the smaller the time overhead of the computing task, the less time the computing task takes.
  • the calculation module includes a GPU class calculation module, an FPGA class calculation module, and a CPU class calculation module
  • the task type of the calculation task is a data query
  • the scheduling module predicts that the time cost of the calculation task on the CPU class calculation module is 0.01 seconds, and the calculation task
  • the time overhead on the GPU class computing module is 0.02 seconds, and the time overhead of the computing task on the FPGA class computing module is 0.04 seconds.
  • Step 303 Obtain a resource occupancy rate of each computing module.
  • the resource occupancy rate is used to indicate the usage of resources in the computing module.
  • the resources of the computing module include: a computing resource, or a communication resource, or a computing resource and a communication resource.
  • the scheduling module in the heterogeneous system acquires the resource occupancy rate of each computing module.
  • step 302 and step 303 can be performed simultaneously.
  • Step 304 Determine a target computing module from at least two computing modules according to a time overhead and a resource occupancy rate.
  • the scheduling module in the heterogeneous system determines the target computing module according to the time overhead and the resource occupancy rate.
  • step 305 the computing task is assigned to the target computing module.
  • the target calculation module is used to perform calculation tasks.
  • the scheduling module in the heterogeneous system assigns the computing task to the target computing module, and the target computing module performs the computing task.
  • the computing task allocation method predicts the time cost of the computing task on each computing module according to the task type of the computing task, and obtains the resource occupancy rate of each computing module.
  • the calculation module for performing the calculation task considering the time cost of the calculation task in the calculation module and the resource usage in the calculation module, it is helpful to solve the task type that only considers the calculation task in the process of allocating the calculation task. It may reduce the computational efficiency of the entire heterogeneous system, and achieve the effect of improving the computational efficiency of the heterogeneous system as a whole by considering the time cost and resource usage of each computing module when allocating computational tasks.
  • Step 304 in the embodiment of Figure 3 has a number of possible implementations, two embodiments being provided herein.
  • step 304 can be implemented instead as step 304a and step 304b, as shown in FIG. 4A:
  • Step 304a calculating a weighted sum of each computing module according to a time cost of each computing module and a resource occupancy rate of each computing module;
  • the scheduling module calculates the weighted sum of each computing module according to the following formula:
  • Y is the weighted sum of each computing module
  • ⁇ 1 is the resource occupancy rate of each computing module
  • k 1 is the weight corresponding to the resource occupancy rate
  • ⁇ 2 is the time overhead for each computing module to perform the computing task
  • the scheduling module calculates the weighted sum of the first computing module according to the time overhead of the first computing module and the resource occupancy of the first computing module; the time overhead of the computing module according to the second type and the resource occupancy of the second computing module Rate, calculate the weighted sum of the second type of calculation module; calculate the weighted sum of the third type of calculation module according to the time cost of the third type of calculation module and the resource occupancy rate of the third type of calculation module.
  • step 304b the calculation module whose weighted sum does not exceed the predetermined threshold and has the smallest time overhead is determined as the target calculation module.
  • the target computing module needs to satisfy two conditions in this embodiment:
  • Condition 1 the weighted sum does not exceed a predetermined threshold, which indicates that the target computing module is not in a busy state
  • Condition 2 indicates that the target computing module is a computing module that is good at performing the computing task.
  • the scheduling module determines the computing module as the target computing module.
  • the present embodiment determines a target computing module by using a computing module that does not exceed a predetermined threshold and has a minimum time overhead, and takes priority in considering the resource usage rate of each computing module.
  • the time cost calculation module is determined as the target calculation module, which is beneficial to fully utilize the computing performance of the heterogeneous system.
  • step 304 can be implemented instead as step 304a and step 304c, as shown in FIG. 4B:
  • Step 304a calculating a weighted sum of each computing module according to a time cost of each computing module and a resource occupancy rate of each computing module;
  • step 304c the calculation module having the smallest weighted sum is determined as the target calculation module.
  • the calculation module with the smallest weighted sum is the calculation module that is most suitable for performing the calculation task in terms of both time cost and resource occupancy.
  • the scheduling module determines the computing module as a target computing module.
  • the computing module with the smallest weighted sum is preferentially determined as the target computing module, so that the computing task can be executed as soon as possible, thereby reducing the waiting time of the computing task.
  • the computing task can be executed as soon as possible, thereby reducing the waiting time of the computing task.
  • step 304b in FIG. 4A since the target calculation module needs to satisfy two conditions at the same time, the condition 2 is detected according to the condition 1 detection first, or the condition 2 is detected first and then the condition 1 is detected, and the step 304b exists.
  • the two different embodiments are described below using the embodiment of FIG. 5 and the embodiment of FIG. 6.
  • FIG. 5 a flow chart of a method for calculating a task assignment according to an exemplary embodiment of the present invention is shown. This embodiment is exemplified by applying the computing task allocation method to the heterogeneous system shown in FIG. 2.
  • the calculation allocation method includes the following steps:
  • Step 501 Determine, according to the task type of the computing task to be allocated, at least two computing modules having the capability of performing the computing task from the n computing modules.
  • the scheduling module acquires the task type of the computing task to be allocated.
  • the scheduling module determines, in the n computing modules of the heterogeneous system, at least two computing modules having the capability of executing the computing task from the n computing modules according to the task type of the computing task to be allocated.
  • Step 502 Predict the time overhead that the computing task performs on each of the at least two computing modules.
  • the scheduling module predicts a time overhead performed by the computing task on each of the at least two computing modules according to the task type of the computing task.
  • Step 503 Obtain a resource occupancy rate of each computing module.
  • the resource occupancy rate is used to indicate the usage of resources in the computing module.
  • the resources of the computing module include: computing resources, or communication resources, or computing resources and communications Letter resources.
  • the scheduling module in the heterogeneous system acquires the resource occupancy rate of each computing module.
  • step 502 and step 503 may be performed at the same time; or, step 502 may be performed before step 503; or step 503 may be performed before step 502, which is not limited in this embodiment.
  • Step 504 Calculate a weighted sum of each computing module according to a time cost of each computing module and a resource occupancy rate of each computing module.
  • the scheduling module calculates the weighted sum of each computing module according to the following formula:
  • Y is the weighted sum of each computing module
  • ⁇ 1 is the resource occupancy rate of each computing module
  • k 1 is the weight corresponding to the resource occupancy rate
  • ⁇ 2 is the time overhead for each computing module to perform the computing task, k 2 The weight corresponding to the time overhead.
  • the scheduling module calculates the weighted sum of the first computing module according to the time overhead of the first computing module and the resource occupancy of the first computing module; the time overhead of the computing module according to the second type and the resource occupancy of the second computing module Rate, calculate the weighted sum of the second type of calculation module; calculate the weighted sum of the third type of calculation module according to the time cost of the third type of calculation module and the resource occupancy rate of the third type of calculation module.
  • Step 505 Detect whether a weighted sum of each computing module exceeds a predetermined threshold
  • step 506 If there is a weighted sum of the at least one computing module does not exceed the predetermined threshold, then proceeds to step 506;
  • step 507 is entered.
  • Step 506 If there is a weighted sum of the at least one computing module not exceeding a predetermined threshold, determining, by the computing module having the smallest time overhead, the target computing module;
  • the weighted sum of the second computing module does not exceed the predetermined threshold
  • the weighted sum of the third computing module does not exceed the predetermined threshold
  • the time overhead of the second computing module is less than the time overhead of the third computing module. Then, the second calculation module with the smallest time overhead is determined as the target calculation module.
  • Step 507 If the weighted sum of all the computing modules exceeds a predetermined threshold, the current allocation is abandoned, or a computing module is randomly determined as the target computing module, or the target computing module is determined by other determining manners.
  • the computing task can be randomly assigned to a certain computing mode.
  • the block is executed, or the computing task is allocated to the computing module with the minimum time overhead, or the computing task is allocated to the computing module with the minimum resource occupancy.
  • the processing method adopted in step 507 is not limited in this embodiment. .
  • step 508 the computing task is assigned to the target computing module.
  • the target calculation module is used to perform the calculation task.
  • the scheduling module in the heterogeneous system assigns the computing task to the target computing module, and the target computing module performs the computing task.
  • the calculation module with relatively low time overhead and resource occupancy rate can be selected, and then determined from the preliminary selected computing module.
  • the calculation module with the smallest time overhead is used as the target calculation module, thereby improving the computational efficiency of the heterogeneous system and giving full play to the computational performance of the heterogeneous system.
  • FIG. 6 is a flowchart of a method for calculating a task assignment method according to an exemplary embodiment of the present invention. This embodiment is exemplified by applying the computing task allocation method to the heterogeneous system shown in FIG. 2.
  • the calculation allocation method includes the following steps:
  • Step 601 Determine, according to the task type of the computing task to be allocated, at least two computing modules having the capability of performing the computing task from the n computing modules.
  • the scheduling module acquires the task type of the computing task to be allocated.
  • the scheduling module determines, in the n computing modules of the heterogeneous system, at least two computing modules having the capability of executing the computing task from the n computing modules according to the task type of the computing task to be allocated.
  • a computing module having the ability to perform the computing task includes: a CPU class computing module and an FPGA class computing module.
  • Step 602 Predict the time overhead that the computing task performs on each of the at least two computing modules.
  • the scheduling module predicts a time overhead performed by the computing task on each of the at least two computing modules according to the task type of the computing task.
  • step 603 the computing module with the smallest time overhead is determined as the first computing module.
  • the time cost of the computing task A on the CPU class computing module is 0.1 second
  • the time cost of the computing task A on the GPU class computing module is 1 second
  • the time overhead on the FPGA class computing module is 0.9 seconds, which has the smallest
  • the calculation module of the time overhead is a CPU class calculation module
  • the CPU class calculation module is a first calculation module.
  • Step 604 Obtain a resource occupancy rate of each computing module.
  • the resources in the computing module include: computing resources, or communication resources, or computing resources and communication resources.
  • the resource occupancy rate is the computing resource occupancy rate of the computing module, or the communication resource occupancy rate, or the total occupancy rate of the computing resource occupancy rate and the communication resource occupancy rate.
  • the resource occupancy is equal to the resources that have been occupied divided by the total available resources.
  • step 604 can be performed simultaneously.
  • Step 605 Calculate a weighted sum of the first computing module according to a time cost of the first computing module and a resource occupancy rate of the first computing module.
  • the weighted sum refers to a value obtained by summing the time overhead and the resource occupancy according to their respective weights.
  • the weighted sum of the first calculation module is calculated as follows:
  • Y is the weighted sum of the first computing module, ⁇ 1 is the resource occupancy rate, k 1 is the weight corresponding to the resource occupancy rate, ⁇ 2 is the time overhead, and k 2 is the weight corresponding to the resource occupancy rate.
  • Step 606 Detect whether the weighted sum of the first computing module exceeds a predetermined threshold.
  • the predetermined threshold is preset, and the predetermined threshold is generally set to an empirical value.
  • step 607 When it is detected that the weighted sum of the first calculation module does not exceed the predetermined threshold, step 607 is performed; when it is detected that the weighted sum of the first calculation module exceeds the predetermined threshold, step 608 is performed.
  • Step 607 Determine the first calculation module as the target calculation module.
  • Step 608 If the predetermined threshold is exceeded, the other computing modules of the at least two computing modules except the first computing module are determined as the second computing module.
  • the first computing module is a CPU class computing module
  • the computing module that performs the computing task includes a CPU class computing module and an FPGA class computing module
  • the FPGA class computing module is determined as the second computing module.
  • Step 609 Calculate a weighted sum of the second computing module according to a time cost of the second computing module and a resource occupancy rate of the second computing module.
  • the second weighted sum refers to a value obtained by summing the time overhead and the resource occupancy according to respective weights.
  • the type of the second calculation module is different from the type of the first calculation module.
  • the second weighted sum of the second calculation module is calculated according to the following formula:
  • L is the second weighted sum of the second computing module
  • ⁇ 3 is the resource occupancy rate
  • k 3 is the weight corresponding to the resource occupancy rate
  • ⁇ 4 is the time overhead
  • k 4 is the weight corresponding to the resource occupancy rate.
  • Step 610 Detect whether the weighted sum of the second computing module is less than a weighted sum of the first computing module.
  • step 611 If the weighted sum of the second calculation module is smaller than the weighted sum of the first calculation module, step 611 is performed. If the weighted sum of the second computing module is not less than the weighted sum of the first computing module, step 607 is executed to determine the first computing module as the target computing module.
  • step 611 is still performed.
  • the first computing module is used as the target computing module.
  • Step 611 If the weighted sum of the second computing module is smaller than the weighted sum of the first computing module, determine the second computing module having the smallest second weighting and correspondence as the target computing module.
  • Step 612 assigning a computing task to the target computing module.
  • the target calculation module is used to perform calculation tasks.
  • the scheduling module of the heterogeneous system assigns the computing task to the target computing module, and the target computing module performs the computing task.
  • the computing task allocation method predicts the time cost of the computing task on each computing module according to the task type of the computing task, and obtains the resource occupancy rate of each computing module.
  • the calculation module for performing the calculation task considering the time cost of the calculation task in the calculation module and the resource usage in the calculation module, it is helpful to solve the task type that only considers the calculation task in the process of allocating the calculation task. It may reduce the computational efficiency of the entire heterogeneous system, and achieve the effect of comprehensively considering the time overhead and resource usage of each computing module and improving the computational efficiency of the heterogeneous system.
  • the weighting of the module and the weighting of the second computing module determine the target computing module to prevent the heterogeneous system from assigning computing tasks to computing modules that are inconvenient to handle the computing tasks, thereby helping to improve the efficiency of heterogeneous data processing.
  • the heterogeneous system needs to obtain the resource occupancy rate of each computing module.
  • the calculation module including the CPU class calculation module, the GPU class calculation module, and the FPGA class calculation module as an example
  • the steps of obtaining the resource occupancy rate of the calculation module include the following three cases:
  • obtaining the resource occupancy rate of the CPU class computing module can be implemented by the following two steps, as shown in FIG. 7A:
  • Step 701 Read the cache occupancy rate of each on-chip router of the NoC.
  • the CPU type calculation module in this embodiment is implemented by an on-chip network (English: Network-On-hip, NoC). Since the CPU type calculation module is implemented by the NoC, it is necessary to acquire the resource occupancy rate of the NoC by acquiring the resource occupancy rate of the CPU type calculation module.
  • the cache occupancy is used to characterize the communication resource occupancy of each on-chip router on the NoC.
  • the cache occupancy of each on-chip router is periodically calculated by the specified CPU on the NoC.
  • each node includes a CPU 71 and a router 72. That is, in each node, one CPU 71 is connected to one router 72, each CPU 71 stores a calculation rule and a cache, and the router 72 realizes communication between the respective CPUs 71.
  • the scheduling module in the heterogeneous system periodically reads the values of the registers in the router in the NoC that are connected to the specified CPU.
  • step 702 the total cache occupancy rate is obtained by summing the cache occupancy rate, and the total cache occupancy rate is determined as the resource occupancy rate of the NoC.
  • the scheduling module in the heterogeneous system sums the read value, that is, the cache occupancy rate, and obtains the total cache occupancy rate of the NoC on-chip router, and determines the total cache occupancy rate as the NoC resource occupancy rate.
  • the total cache occupancy is used to characterize the communication resource occupancy on the NoC.
  • the configuration information corresponding to the task type of the task is sent to the FPGA class calculation module; after receiving the configuration information, the FPGA class calculation module dynamically invokes the programmable logic resource in the FPGA class calculation module according to the configuration information to generate a corresponding hardware circuit.
  • the heterogeneous system records the occupancy of the programmable logic resources in the current FPGA class computing module.
  • the FPGA-based computing module can include multiple servers, each of which can include multiple FPGAs, the resource occupancy rate of the FPGA-based computing module is different depending on the type of the FPGA-based computing module, and has the following three cases:
  • the resource occupancy rate on the FPGA is obtained as the resource occupancy rate of the FPGA-based computing module.
  • the scheduling module of the heterogeneous system obtains the logical resource occupancy rate of the FPGA class computing module.
  • the scheduling module of the heterogeneous system acquires the logical resource occupancy rate on the FPGA.
  • the logical resource occupancy is equal to the number of logical resources occupied on the FPGA divided by the total number of logical resources on the FPGA.
  • the black rectangle 44 represents the occupied logical resource.
  • the scheduling module 40 acquires the logical resource occupancy rate on the FPGA 45, and the FPGA 45 has a total of 20 logical resources.
  • the three logical resources are occupied, and the resource occupancy rate obtained by the scheduling module 44 is 3/20.
  • the resource occupancy rate of each FPGA and the transmission overhead between different FPGAs are obtained as the resource occupancy rate of the FPGA-based computing module.
  • the transfer overhead between different FPGAs is used to characterize how long it takes for data to travel from one FPGA to another.
  • the scheduling module of the heterogeneous system obtains the logical resource occupancy rate and the communication resource occupancy rate of the FPGA class computing module.
  • the scheduling module of the heterogeneous system acquires the logical resource occupancy rate and different FPGAs on different FPGAs. The overhead between transmissions.
  • the scheduling module of the heterogeneous system acquires the logical resource occupancy rate on different FPGAs and the transmission overhead between different FPGAs, the logical resource occupancy rate on different FPGAs and the transmission overhead between different FPGAs are weighted. At Rational, get the resource occupancy rate of the FPGA class computing module.
  • M x 1 t 1 +x 2 t 2 , calculate the resource occupancy rate of the FPGA-based computing module, where M is the resource occupancy rate of the FPGA-based computing module, and t 1 is different on the FPGA.
  • Logical resource occupancy x 1 is the weight corresponding to the logical resource occupancy on different FPGAs
  • t 2 is the transmission overhead between different FPGAs
  • x 2 is the weight corresponding to the transmission overhead between different FPGAs.
  • the black rectangle 46 represents the occupied logical resources.
  • the scheduling module 49 acquires the logic of the FPGA 47 and the FPGA 48. Resource occupancy, and the transmission overhead between FPGA47 and FPGA48.
  • the scheduling module of the heterogeneous system obtains the logical resource occupancy rate and the communication resource occupancy rate of the FPGA class computing module.
  • the transfer overhead between different servers is used to characterize how long it takes for data to travel from one server to another.
  • the scheduling module acquires the logical resource occupancy rate on different FPGAs and the transmission between different servers. Overhead.
  • the scheduling module of the heterogeneous system acquires the logical resource occupancy rate on different FPGAs and the transmission overhead between different FPGAs, the logical resource occupancy rate on different FPGAs and the transmission overhead between different FPGAs are weighted. Processing, the resource occupancy rate of the FPGA class computing module is obtained.
  • L x 3 t 3 +x 4 t 4 , calculate the resource occupancy rate of the FPGA class computing module, wherein L is the resource occupancy rate of the FPGA class computing module, and t 3 is different on the FPGA.
  • the logical resource occupancy rate, x 3 is the weight corresponding to the logical resource occupancy rate on different FPGAs, t 4 is the transmission overhead between different servers, and x 4 is the weight corresponding to the transmission overhead between different servers.
  • the black rectangle 51 represents the occupied logical resource.
  • the scheduling module 56 acquires the logical resource occupancy of the FPGA 53 and the FPGA 55. And the transmission overhead between the server 52 and the server 54.
  • Step 901 Obtain a device queue occupancy rate on the GPU type computing module.
  • the GPU In the abstraction layer of the GPU-based computing module, the GPU is scheduled and managed through the device queue, and the resource occupancy rate of the GPU-based computing module is obtained by the occupancy rate of the device queue.
  • the device queue occupancy is equal to the occupied resources in the device queue divided by the total resources of the device queue. For example, if the device queue has 100 resources and 50 resources are occupied, the device queue occupancy rate is 1/2.
  • Step 902 Determine the device queue occupancy rate as the resource occupancy rate of the GPU class computing module.
  • the computing module is a GPU class computing module, a CPU class computing module, and an FPGA class computing module.
  • the computing task D received by the heterogeneous system is a data query operation, it is determined that the task type of the computing task D is a query; and the computing task D is predicted according to the task type in the GPU class computing module, the CPU class computing module, and the FPGA class computing module.
  • Time overhead wherein the CPU class computing module has a minimum time overhead, and the CPU class computing module is determined as the first computing module.
  • the CPU class calculation module is determined as the target calculation module, and is executed by the CPU class calculation module.
  • the computing task if the first weighted sum is greater than the second weighted sum 1 of the GPU class computing module, but the first weighted sum is less than the second weighted sum 2 of the FPGA class computing module, the GPU class computing module is used as the target computing module
  • the computing task D is executed by the GPU class computing module.
  • the computing module may be any two of the GPU-based computing module, the CPU-based computing module, and the FPGA-based computing module. Those skilled in the art may combine other implementation manners of the computing task allocation method according to the foregoing embodiments. I won't go into details here.
  • FIG. 10 shows a block diagram of a computing task allocation apparatus according to an embodiment of the present invention.
  • the computing task allocation device can be implemented as a heterogeneous system by software, hardware or a combination of both All or part of the system.
  • the computing task distribution device includes:
  • the determining unit 1010 is configured to implement the functions of at least one of the above steps 301, 304, step 304a, step 304b, step 304c, and other implicit or disclosed determining steps.
  • the predicting unit 1020 is configured to implement the functions of the foregoing step 302, and the functions of other implicit or disclosed prediction steps.
  • the obtaining unit 1030 is configured to implement the functions of the foregoing step 303, and other functions of the implicit or public acquisition step.
  • the allocating unit 1040 is configured to implement the functions of at least one of the steps 305 above, and the functions of other implicit or disclosed allocation steps.
  • FIG. 10 shows a block diagram of a computing task allocation apparatus according to an embodiment of the present invention.
  • the computing task allocation device can be implemented as all or part of a heterogeneous system by software, hardware, or a combination of both.
  • the computing task distribution device includes:
  • the determining unit 1010 is configured to implement the functions of at least one of the foregoing steps 501, 504 to 507, step 601, step 604 to step 611, and other implicit or disclosed determining steps.
  • the predicting unit 1020 is configured to implement the functions of the foregoing steps 502 and 602, and other implicit or disclosed prediction steps.
  • the obtaining unit 1030 is configured to implement the functions of the foregoing steps 503 and 603, and other implicit or public acquisition steps.
  • the allocating unit 1040 is configured to implement the functions of at least one of the above steps 508 and 612, and the functions of other implicit or disclosed allocation steps.
  • the determining unit 1010 may perform the determining module in the memory by using the scheduling module of the heterogeneous system; the foregoing predicting unit 1020 may implement the predictive module in the storage by using the scheduling module of the heterogeneous system;
  • the obtaining unit 1030 may implement the obtaining module in the memory through the scheduling module of the heterogeneous system, and the foregoing allocating unit 1040 may implement the allocating module in the memory by the scheduling module of the heterogeneous system.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Stored Programmes (AREA)

Abstract

一种异构系统、计算任务分配方法及装置,涉及数据处理领域,该方法用于包括有n种计算单元的异构系统中,该方法包括:根据待分配的计算任务的任务类型,从n种计算模块中确定至少两种具有执行计算任务的能力的计算模块(301);预测计算任务在至少两种计算模块中每种计算模块上执行的时间开销(302),并获取每种计算模块的资源占用率(303);根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块(304);将计算任务分配至目标计算模块(305);解决了在分配计算任务的过程中只考虑计算任务的任务类型,可能会降低整个异构系统的计算效率的问题,达到了综合考虑每种计算单元的时间开销和资源使用情况,整体上提高异构系统的计算效率的效果。

Description

异构系统、计算任务分配方法及装置 技术领域
本发明实施例涉及数据处理领域,特别涉及一种异构系统、计算任务分配方法及装置。
背景技术
中大型数据库中存储有海量数据,所以中大型数据库需要对海量数据进行计算处理,并要求具有较高的处理速度。目前的一些中大型数据库采用异构系统来对数据库中的计算任务进行加速。
异构系统是指将使用不同类型的指令集和体系架构的计算单元组成计算系统的计算方式。常见的异构系统包括:中央处理器(Central Processing Unit,CPU)类计算模块、图形处理器(Graphics Processing Unit,GPU)类计算模块和现场可编程门阵列(Field Programmable Gate Array,FPGA)计算模块。由于每种计算模块在执行计算任务时具有各自擅长的类型,在执行擅长的计算任务时所耗费的时长较少。当异构系统接收到计算任务时,异构系统中的操作系统根据计算任务的任务类型,将计算任务分配给异构系统中相应的计算模块进行处理。比如:计算任务A的任务类型是复杂运算类型,则将计算任务A分配至CPU类计算模块进行处理;计算任务B的任务类型是浮点型,则将计算任务B分配至GPU类计算模块进行处理;计算任务C的任务类型是并行运算类型,则将计算任务C分配至FPGA类计算模块进行处理。
在上述分配计算任务的过程中,只考虑了计算任务的任务类型。当某一种计算模块的空闲资源较少且被分配了较多的计算任务,而其它种类的计算模块又具有较多的空闲资源时,会降低整个异构系统的计算效率。
发明内容
为了解决现有技术的问题,本发明提供了一种异构系统、计算任务分配方法及装置。所述技术方案如下:
第一方面,本发明实施例提供了一种计算任务分配方法。由于异构系统在 分配计算任务的过程中只考虑计算任务的任务类型,可能会出现降低整个异构系统的计算效率的情况,为了充分考虑各个计算模块的资源使用率,对分配计算任务的方法进行了改进。
作为本申请的一种可能的实现方式,该计算任务分配方法包括:根据待分配的计算任务的任务类型,在异构系统所包括的n种计算模块中确定至少两种具有执行该计算任务的能力的计算模块;预测该计算任务在至少两种计算模块中每种计算模块上执行的时间开销;并获取每种计算模块的资源占用率;根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块;并将该计算任务分配至目标计算模块,由目标计算模块执行该计算任务。
本申请通过根据计算任务的任务类型,预测出计算任务在每种计算模块上的时间开销,并获取每种计算模块的资源占用率,在确定用于执行计算任务的计算模块时,同时考虑计算任务在计算模块的时间开销和计算模块中的资源使用率,有助于解决在分配计算任务的过程中只考虑计算任务的任务类型,可能会降低整个异构系统的计算效率的问题,达到了在分配计算任务时,综合考虑每种计算模块的时间开销和资源使用情况,整体上提高异构系统的计算效率的效果。
结合第一方面,在第一方面的第一种可能的实施方式,上述根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块,包括:根据每种计算模块的时间开销和每种计算模块的资源占用率,计算得到每种计算模块的加权和;将加权和未超过预定阈值且具有最小的该时间开销的计算模块,确定为目标计算模块。在本实施方式中,通过加权和未超过预定阈值且具有最小的时间开销的计算模块确定为目标计算模块,在考虑每种计算模块的资源使用率的前提下,优先将具有最小的时间开销的计算模块确定为目标计算模块,有利于充分发挥异构系统的计算性能。
结合第一方面的第一种可能的实施方式,在第一方面的第二种可能的实施方式中,上述将加权和未超过预定阈值且具有最小的该时间开销的计算模块,确定为目标计算模块,包括:检测每种计算模块的加权和是否超过该预定阈值;若存在至少一种计算模块的加权和未超过该预定阈值,则将具有最小的该时间开销的计算模块确定为目标计算模块。通过使用加权和对计算模块进行初步筛选,能够筛选出时间开销和资源占用率两者相对较低的计算模块,然后再从初步筛选出的计算模块中,确定出具有最小的时间开销的计算模块作为目标计算 模块,从而提高异构系统的计算效率,充分发挥异构系统的计算性能。
结合第一方面的第一种可能的实施方式,在第一方面的第三种可能的实施方式中,上述将该加权和未超过预定阈值且具有最小的该时间开销的计算模块,确定为目标计算模块,包括:确定具有最小的该时间开销的第一计算模块;检测第一计算模块的加权和是否超过该预定阈值;若第一计算模块的加权和未超过该预定阈值,则将第一计算模块确定为目标计算模块。通过先确定具有最小的时间开销的第一计算模块,在第一计算模块的加权和不超过预定阈值时,直接将第一计算模块确定为目标计算模块,不需要计算其它计算模块的加权和,从而能够减少在确定目标计算模块时的计算量,加快目标计算模块的确定速度,进一步提高异构系统的计算效率。
在第一方面的第四种可能的实现方式中,根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块,包括:根据每种计算模块的时间开销和每种计算模块的资源占用率,计算得到每种计算模块的加权和;将具有最小的该加权和的该计算模块,确定为目标计算模块。通过综合考虑时间开销和资源占用率,优先将具有最小加权和的计算模块确定为目标计算模块,能够使得计算任务能够尽快被执行,从而减少计算任务的等待时间,提高异构系统的计算效率。
结合第一方面的第一种可能的实现方式、第二种可能的实现方式、第三种可能的实现方式、第四种可能的实现方式中的任一种可能的实现方式,在第一方面的第五种可能的实现方式中,上述根据每种计算模块的时间开销和每种计算模块的资源占用率,计算得到每种计算模块的加权和,包括:
按如下公式计算每种计算模块的加权和:
Y=k1α1+k2α2
其中,Y为每种计算模块的加权和,α1为每种计算模块的资源占用率,k1为资源占用率对应的权重,α2为每种计算模块执行计算任务的时间开销,k2为时间开销对应的权值。
结合第一方面、第一方面的第一种可能的实施方式、第一方面的第二种可能的实施方式、第一方面的第三种可能的实施方式、第一方面的第四种可能的实施方式、第一方面的第五种可能的实施方式中的任一种可能的实现方式,在第一方面的第六种可能的实施方式,至少两种计算模块包括CPU类计算模块、GPU类计算模块和FPGA类计算模块中的至少两种。
结合第一方面、第一方面的第一种可能的实施方式、第一方面的第二种可能的实施方式、第一方面的第三种可能的实施方式、第一方面的第四种可能的实施方式、第一方面的第五种可能的实施方式、第一方面的第六种可能的实施方式中的任意一种可能的实现方式,在第一方面的第七种可能的实施方式,资源占用率包括计算模块的计算资源占用率和/或通信资源占用率。
结合第一方面,和第一方面的第一种至第七种可能的实施方式中的任一种可能的实现方式,在第一方面的第八种可能的实施方式,计算模块包括CPU类计算模块,CPU类计算模块通过片上网络Noc实现,获取计算模块的资源占用率,包括:读取NoC的各个片上路由器的缓存占用率,缓存占用率用于表征NoC上的通信资源占用率,各个片上路由器的缓存占用率是NoC上的指定CPU周期性地统计得到的;对缓存占用率求和得到总缓存占用率,将总缓存占用率确定为NoC的资源占用率。
结合第一方面,和第一方面的第一种至第八种可能的实施方式的任一种可能的实现方式,在第一方面的第九种可能的实施方式,计算模块包括GPU类计算模块,获取计算模块的资源占用率,包括:获取GPU类计算模块上的设备队列占用率;将设备队列占用率确定为GPU类计算模块的资源占用率。
结合第一方面,和第一方面的第一种至第九种可能的实施方式的任一种可能的实现方式,在第一方面的第十种可能的实施方式,计算模块包括FPGA类计算模块,获取计算模块的资源占用率,包括:当用于计算计算任务的计算资源位于同一个FPGA上时,获取FPGA上的资源占用率,作为FPGA类计算模块的资源占用率;当用于计算计算任务的计算资源位于不同的FPGA上时,获取每个FPGA的资源占用率和不同的FPGA之间的传输开销,作为FPGA类计算模块的资源占用率;当用于计算计算任务的计算资源位于不同的FPGA上且位于不同的服务器时,获取每个FPGA的资源占用率和不同的服务器之间的传输开销,作为FPGA类计算模块的资源占用率。
第二方面,本发明实施例提供了一种计算任务分配装置,该计算任务分配装置包括至少一个单元,该至少一个单元用于实现上述第一方面或第一方面中任意一种可能的实现方式所提供的计算任务分配方法。
第三方面,本发明实施例提供了一种异构系统,该异构系统包括调度模块、存储器和n个计算模块,n为大于1的整数,该调度模块用于实现上述第一方面或第一方面中任意一种可能的实施方式中所提供的计算任务分配方法。
第四方面,本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有用于实现上述第一方面或第一方面中任意一种可能的实施方式所提供的计算任务分配方法的可执行程序。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一个实施例提供的计算任务分配方法的实施环境的结构示意图;
图2是本发明一个实施例提供的异构系统的结构示意图;
图3是本发明一个实施例提供的计算任务分配方法的方法流程图;
图4A是本发明一个实施例提供的计算任务分配方法的方法流程图;
图4B是本发明另一个实施例提供的计算任务分配方法的方法流程图;
图5是本发明另一个实施例提供的计算任务分配方法的方法流程图;
图6是本发明另一个实施例提供的计算任务分配方法的方法流程图;
图7A是本发明另一个实施例提供的计算任务分配方法的部分步骤的方法流程图;
图7B是本发明一个实施例提供的一种NoC的结构示意图;
图8A是本发明一个实施例提供的一种FPGA的结构示意图;
图8B是本发明另一个实施例提供的一种FPGA的结构示意图;
图8C是本发明另一个实施例提供的一种FPGA的结构示意图;
图9是本发明一个实施例提供的计算任务分配方法的部分步骤的方法流程图;
图10是本发明一个实施例提供的计算任务分配装置的结构方框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关 联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
请参考图1,其示出了本发明一个实施例提供的计算任务分配方法的实施环境的结构示意图,该实施环境包括:数据库110、数据库操作服务器120和客户端130。
数据库110用于存储数据。
数据库操作服务器120用于处理数据库110中存储的数据。为了提高数据库操作服务器120的数据处理的速度,该数据库操作服务器120采用了异构系统进行加速。换句话说,数据库操作服务器120是采用异构系统实现的一台服务器或服务器集群。
客户端130是将针对数据的计算任务发送至数据库操作服务器120,请求数据库操作服务器120对该计算任务进行处理的设备,比如:手机、平板电脑、个人计算机等。
当客户端130将计算任务发送至数据库操作服务器120后,由数据库操作服务器120读取数据库110中存储的数据,并进行处理。示意性的,计算任务包括:数据查询操作、数据排序操作、数据求和操作等数据库操作。在不同的实施场景中,计算任务的具体类型可以不同。
数据库110通过网络与数据库操作服务器120连接。
数据库操作服务器120通过有线网络或无线网络与客户端130连接。
可选的,上述的无线网络或有线网络使用标准通信技术和/或协议。网络通常为因特网、但也可以是任何网络,包括但不限于局域网(英文:Local Area Network,LAN)、城域网(英文:Metropolitan Area Network,MAN)、广域网(英文:Wide Area Network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合)。在一些实施例中,使用包括超文本标记语言(英文:Hyper Text Mark-up Language,HTML)、可扩展标记语言(英文:Extensible Markup Language,XML)等的技术和/或格式来代表通过网络交换的数据。此外还可以使用诸如安全套接字层(英文:Secure Socket Layer,SSL)、传输层安全(英文:Transport Layer Security,TLS)、虚拟专用网络(英文:Virtual Private Network,VPN)、网际协议安全(英文:Internet Protocol Security,IPsec)等常 规加密技术来加密所有或者一些链路。在另一些实施例中,还可以使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。
请参考图2,其示出了本发明一个示例性实施例提供的异构系统200的结构示意图。该异构系统200包括调度模块210、存储器220、网络接口230、GPU类计算模块240、CPU类计算模块250和FPGA类计算模块260。其中,GPU类计算模块240、CPU类计算模块250和FPGA类计算模块260是异构系统200中的三种计算模块。
可选的,异构系统200中包括GPU类计算模块240、CPU类计算模块250和FPGA类计算模块260中的至少两种计算模块。
调度模块210可以通过CPU或GPU或FPGA实现,以调度模块210采用CPU来实现为例,调度模块210包括一个或一个以上的处理核心。调度模块210通过运行软件程序以及模块,从而执行各种功能应用以及数据处理。比如:根据待分配的计算任务的任务类型,从n种计算模块中确定至少两种具有执行计算任务的能力的计算模块;预测计算任务在至少两种计算模块中每种计算模块上执行的时间开销,并获取每种计算模块的资源占用率;根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块;将计算任务分配至目标计算模块。
存储器220用于存储软件程序以及模块。
存储器220可存储操作系统21、至少一个功能所需的应用程序模块22。操作系统21可以是实时操作系统(Real Time eXecutive,RTX)、LINUX、UNIX、WINDOWS或OS X之类的操作系统。应用程序模块22可以包括确定模块、预测模块、获取模块、分配模块等。
其中,确定模块,用于根据待分配的计算任务的任务类型,从n种计算模块中确定至少两种具有执行计算任务的能力的计算模块;预测模块,用于预测计算任务在至少两种计算模块中每种计算模块上执行的时间开销;获取模块,还用于获取每种计算模块的资源占用率;确定模块,还用于根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块;分配模块,还用于将计算任务分配至目标计算模块,目标计算模块用于执行计算任务。
此外,存储器220可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器 (EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
网络接口230可以为多个,用于为异构系统200获取数据库110中的数据,接收计算任务,以及与其他设备进行通信。
其中,存储器220、网络接口230、GPU类计算模块240、CPU类计算模块250和FPGA类计算模块260分别与调度模块210相连。可选的,当异构系统200是一台服务器时,存储器220、网络接口230、GPU类计算模块240、CPU类计算模块250和FPGA类计算模块260分别通过总线与调度模块210相连;或者,当异构系统200是一个服务器集群时,存储器220、网络接口230、GPU类计算模块240、CPU类计算模块250和FPGA类计算模块260分别通过网络与调度模块210相连。
本领域技术人员可以理解,图2中所示出的异构系统200结构并不构成对异构系统200的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。比如,异构系统200包括n种计算模块,n为大于等于2的整数。也即,在一些实施例中,异构系统200可以包括2种计算模块;在另一些实施例中,异构系统200可以包括4种计算模块。
请参考图3,其示出了本发明一个示例性实施例提供的计算任务分配方法的流程图。本实施例以该计算任务分配方法应用于如图2所示的异构系统200中来举例说明,该方法包括:
步骤301,根据待分配的计算任务的任务类型,从n种计算模块中确定至少两种具有执行计算任务的能力的计算模块。
用户在客户端上对数据进行处理时,由客户端生成相应的计算任务,并将计算任务发送至异构系统,异构系统中的调度模块接收该计算任务。
可选的,计算任务是针对数据库中存储的数据进行的处理操作的任务,比如:查询数据、对数据进行排序、更新数据、删除数据、筛选数据、对数据进行数学运算等,其中,数学运算包括求和、求差、求积、求商、求余、取平均、求最大值、求最小值等。
可选的,计算任务的任务类型包括查询、排序、数学运算、筛选、比较、更新、删除等。
可选地,计算任务采用数据库语言描述时,根据数据库语言中的操作名称 来确定计算任务的任务类型。比如,查询操作的任务类型为数据查询。
对于某一种任务类型的计算任务,异构系统上存在至少两种计算模块具有执行该任务类型的计算任务的能力。,调度模块从异构系统的n种计算模块中确定出至少两种具有执行计算任务的能力的计算模块。比如,CPU类计算模块和GPU类计算模块具有执行任务类型A的计算任务的能力;又比如,CPU类计算模块、GPU类计算模块、FPGA类计算模块具有执行任务类型B的计算任务的能力。但需要说明的是,存在一些任务类型只能够由一种计算模块来执行,比如任务类型C的计算任务只适合由CPU类计算模块来执行,本实施例暂不讨论这些任务类型的计算任务
步骤302,预测计算任务在至少两种计算模块上的每种计算模块上的时间开销。
对于具有执行该计算任务的能力的每种计算模块来讲,每种计算模块在执行某一任务类型的计算任务时,具有不同的时间开销。时间开销用于表征计算模块在执行计算任务时所需要耗费的时长。
调度模块根据该计算任务的任务类型,预测计算任务在至少两种计算模块上的每种计算模块上的时间开销。
可选的,具有执行该计算任务的能力的计算模块,包括:CPU类计算模块、FPGA类计算模块、GPU类计算模块中的至少两种。
可选的,异构系统中存储有任务类型、计算模块类型和时间开销三者之间的对应关系。
表一
任务类型 计算模块类型 时间开销
A类型 CPU 0.01秒
A类型 GPU 0.20秒
A类型 FPGA 0.04秒
B类型 CPU 0.03秒
B类型 GPU 0.14秒
B类型 FPGA 0.05秒
表一示意性的示出了任务类型、计算模块类型和时间开销三者之间的对应关系。
调度模块根据预设的对应关系对计算任务在每种计算模块上的时间开销 进行预测。调度模块在预设的对应关系中,根据计算任务的任务类型,查询出该计算任务在每种计算模块上的时间开销。该预设的对应关系存储有任务类型、计算模块类型和时间开销三者之间的对应关系。
异构系统中的调度模块预测出计算任务的时间开销越大,计算任务所耗费的时间越多;调度模块预测出计算任务的时间开销越小,计算任务所耗费的时间越少。
比如:计算模块包括GPU类计算模块、FPGA类计算模块和CPU类计算模块,计算任务的任务类型为数据查询,调度模块预测出计算任务在CPU类计算模块上的时间开销为0.01秒,计算任务在GPU类计算模块上的时间开销为0.02秒,计算任务在FPGA类计算模块上的时间开销为0.04秒。
步骤303,获取每种计算模块的资源占用率。
资源占用率用于表示计算模块中资源的使用情况。
可选的,计算模块的资源包括:计算资源,或通信资源,或计算资源和通信资源。
异构系统中的调度模块获取每种计算模块的资源占用率。
需要说明的是,步骤302和步骤303可以同时执行。
步骤304,根据时间开销和资源占用率从至少两种计算模块中确定目标计算模块。
异构系统中的调度模块根据时间开销和资源占用率确定目标计算模块。
步骤305,将计算任务分配至目标计算模块。
目标计算模块用于执行计算任务。
异构系统中的调度模块将计算任务分配给目标计算模块,由目标计算模块执行计算任务。
综上所述,本发明实施例中提供的计算任务分配方法,通过根据计算任务的任务类型,预测出计算任务在每种计算模块上的时间开销,并获取每种计算模块的资源占用率,在确定用于执行计算任务的计算模块时,同时考虑计算任务在计算模块的时间开销和计算模块中的资源使用情况,有助于解决在分配计算任务的过程中只考虑计算任务的任务类型,可能会降低整个异构系统的计算效率的问题,达到了在分配计算任务时,综合考虑每种计算模块的时间开销和资源使用情况,整体上提高异构系统的计算效率的效果。
图3实施例中的步骤304具有多种可能的实施方式,本文中提供了两种实施方式。
在第一种可能的实施方式中,步骤304可被替代实现成为步骤304a和步骤304b,如图4A所示:
步骤304a,根据每种计算模块的时间开销和每种计算模块的资源占用率,计算得到每种计算模块的加权和;
具体地,调度模块按如下公式计算每种计算模块的加权和:
Y=k1α1+k2α2
其中,Y为每种计算模块的加权和,α1为每种计算模块的资源占用率,k1为资源占用率对应的权重,α2为每种计算模块执行计算任务的时间开销,k2为时间开销对应的权值。可选的,k1和k2是常量,比如:k1=0.6,k2=0.4。
假设具有执行该计算任务的能力的计算模块包括三种。调度模块根据第1种计算模块的时间开销和第1种计算模块的资源占用率,计算第1种计算模块的加权和;根据第2种计算模块的时间开销和第2种计算模块的资源占用率,计算第2种计算模块的加权和;根据第3种计算模块的时间开销和第3种计算模块的资源占用率,计算第3种计算模块的加权和。
步骤304b,将加权和未超过预定阈值且具有最小的时间开销的计算模块,确定为目标计算模块。
也即,目标计算模块在该实施方式中需要满足两个条件:
条件1,加权和未超过预定阈值,该条件1表明目标计算模块并未处于繁忙状态;
条件2,具有最小的时间开销,该条件2表明目标计算模块是擅长执行该计算任务的计算模块。
当一个计算模块同时具备这两个条件时,调度模块将该计算模块确定为目标计算模块。
综上所述,本实施例通过将加权和未超过预定阈值且具有最小的时间开销的计算模块确定为目标计算模块,在考虑每种计算模块的资源使用率的前提下,优先将具有最小的时间开销的计算模块确定为目标计算模块,有利于充分发挥异构系统的计算性能。
在第二种可能的实施方式,步骤304可被替代实现成为步骤304a和步骤304c,如图4B所示:
步骤304a,根据每种计算模块的时间开销和每种计算模块的资源占用率,计算得到每种计算模块的加权和;
步骤304c,将具有最小的加权和的计算模块,确定为目标计算模块。
具有最小的加权和的计算模块,是兼顾时间开销和资源占用率两个方面来讲,最适合执行该计算任务的计算模块。
调度模块将该计算模块确定为目标计算模块。
综上所述,本实施例通过综合考虑时间开销和资源占用率,优先将具有最小加权和的计算模块确定为目标计算模块,能够使得计算任务能够尽快被执行,从而减少计算任务的等待时间,提高异构系统的计算效率。
在上述图4A中的步骤304b中,由于目标计算模块需要同时满足2个条件,则根据先对条件1检测再对条件2检测,或者,先对条件2检测再对条件1检测,步骤304b存在至少两种不同的实施方式,下面采用图5实施例和图6实施例对这2种不同的实施方式进行阐述。
请参考图5,其示出了本发明一示例性实施例示出的一种计算任务分配方法的方法流程图。本实施例以该计算任务分配方法应用于图2所示的异构系统中来举例说明。该计算分配方法包括如下步骤:
步骤501,根据待分配的计算任务的任务类型,从n种计算模块中确定至少两种具有执行计算任务的能力的计算模块。
调度模块获取待分配的计算任务的任务类型。
调度模块在异构系统的n种计算模块中,根据待分配的计算任务的任务类型,从n种计算模块中确定出至少两种具有执行该计算任务的能力的计算模块。
步骤502,预测计算任务在至少两种计算模块中的每种计算模块上执行的时间开销。
调度模块根据计算任务的任务类型,预测计算任务在至少两种计算模块中的每种计算模块上执行的时间开销。
步骤503,获取每种计算模块的资源占用率。
资源占用率用于表示计算模块中资源的使用情况。
可选的,计算模块的资源包括:计算资源,或通信资源,或计算资源和通 信资源。
异构系统中的调度模块获取每种计算模块的资源占用率。
需要说明的是,步骤502和步骤503可以同时执行;或,步骤502可以在步骤503之前执行;或,步骤503可以在步骤502之前执行,本实施例对此不加以限定。
步骤504,根据每种计算模块的时间开销和每种计算模块的资源占用率,计算得到每种计算模块的加权和;
具体地,调度模块按如下公式计算每种计算模块的加权和:
Y=k1α1+k2α2
其中,Y为每种计算模块的加权和,α1为每种计算模块的资源占用率,k1为资源占用率对应的权重,α2为每种计算模块执行计算任务的时间开销,k2为时间开销对应的权值。
假设具有执行该计算任务的能力的计算模块包括三种。调度模块根据第1种计算模块的时间开销和第1种计算模块的资源占用率,计算第1种计算模块的加权和;根据第2种计算模块的时间开销和第2种计算模块的资源占用率,计算第2种计算模块的加权和;根据第3种计算模块的时间开销和第3种计算模块的资源占用率,计算第3种计算模块的加权和。
步骤505,检测每种计算模块的加权和是否超过预定阈值;
若存在至少一种计算模块的加权和未超过预定阈值,则进入步骤506;
若所有计算模块的加权和都超过预定阈值,则进入步骤507。
步骤506,若存在至少一种计算模块的加权和未超过预定阈值,则将具有最小的时间开销的计算模块确定为目标计算模块;
示意性的,存在第2种计算模块的加权和未超过预定阈值,以及第3种计算模块的加权和未超过预定阈值,而第2种计算模块的时间开销小于第3种计算模块的时间开销,则将具有最小的时间开销的第2种计算模块确定为目标计算模块。
步骤507,若所有计算模块的加权和都超过预定阈值,则放弃本次分配,或者随机确定一种计算模块为目标计算模块,或采用其它确定方式确定出目标计算模块。
示意性的,若所有计算模块的加权和都超过预定阈值,则所有种类的计算模块都处于较为繁忙的状态,此时可以将该计算任务随机分配至某一种计算模 块进行执行,或者将该计算任务分配至具有最小时间开销的计算模块,或者,将该计算任务分配至具有最小资源占用率的计算模块,本实施例对步骤507所采用的处理方式不加以限定。
步骤508,将计算任务分配至目标计算模块。
目标计算模块用于执行该计算任务。
异构系统中的调度模块将计算任务分配给目标计算模块,由目标计算模块执行计算任务。
综上所述,本实施例通过使用加权和对计算模块进行初步筛选,能够筛选出时间开销和资源占用率两者相对较低的计算模块,然后再从初步筛选出的计算模块中,确定出具有最小的时间开销的计算模块作为目标计算模块,从而提高异构系统的计算效率,充分发挥异构系统的计算性能。
请参考图6,其示出了本发明一示例性实施例示出的一种计算任务分配方法的方法流程图。本实施例以该计算任务分配方法应用于图2所示的异构系统中来举例说明。该计算分配方法包括如下步骤:
步骤601,根据待分配的计算任务的任务类型,从n种计算模块中确定至少两种具有执行计算任务的能力的计算模块。
调度模块获取待分配的计算任务的任务类型。
调度模块在异构系统的n种计算模块中,根据待分配的计算任务的任务类型,从n种计算模块中确定出至少两种具有执行该计算任务的能力的计算模块。
比如,具有执行该计算任务的能力的计算模块包括:CPU类计算模块和FPGA类计算模块。
步骤602,预测计算任务在至少两种计算模块中的每种计算模块上执行的时间开销。
调度模块根据计算任务的任务类型,预测计算任务在至少两种计算模块中的每种计算模块上执行的时间开销。
步骤603,将具有最小的时间开销的计算模块确定为第一计算模块。
比如:计算任务A在CPU类计算模块上的时间开销为0.1秒,计算任务A在GPU类计算模块上的时间开销为1秒,在FPGA类计算模块上的时间开销为0.9秒,则具有最小的时间开销的计算模块为CPU类计算模块,CPU类计算模块为第一计算模块。
步骤604,获取每种计算模块的资源占用率。
计算模块中的资源包括:计算资源,或通信资源,或计算资源和通信资源。
资源占用率是计算模块的计算资源占用率,或通信资源占用率,或计算资源占用率和通信资源占用率的总占用率。
资源占用率等于已被占用的资源除以总的可用资源。
可选地,在执行步骤602时,可以同时执行步骤604。
步骤605,根据第一计算模块的时间开销和第一计算模块的资源占用率,计算得到第一计算模块的加权和。
加权和是指对时间开销和资源占用率按照各自对应的权重进行求和后得到的值。
示意性的,按如下公式计算第一计算模块的加权和:
Y=k1α1+k2α2
其中,Y为第一计算模块的加权和,α1为资源占用率,k1为资源占用率对应的权重,α2为时间开销,k2为资源占用率对应的权值。
可选的,k1和k2是常量,比如:k1=0.6,k2=0.4。
步骤606,检测第一计算模块的加权和是否超过预定阈值。
可选的,预定阈值是预先设置的,预定阈值一般设置为经验值。
当检测到第一计算模块的加权和未超过预定阈值时,执行步骤607;当检测到第一计算模块的加权和超过预定阈值时,执行步骤608。
步骤607,将第一计算模块确定为目标计算模块。
步骤608,若超过预定阈值,则将至少两种计算模块中除第一计算模块的其它计算模块确定为第二计算模块。
比如,第一计算模块是CPU类计算模块,具有执行该计算任务的计算模块包括CPU类计算模块和FPGA类计算模块,则将FPGA类计算模块确定为第二计算模块。
步骤609,根据第二计算模块的时间开销和第二计算模块的资源占用率,计算得到第二计算模块的加权和。
第二加权和是指对时间开销和资源占用率按照各自对应的权重进行求和后得到的值。
第二计算模块的种类与第一计算模块的种类不同。
示意性的,按如下公式计算第二计算模块的第二加权和:
L=k3α3+k4α4
其中,L为第二计算模块的第二加权和,α3为资源占用率,k3为资源占用率对应的权重,α4为时间开销,k4为资源占用率对应的权值。
可选的,k3和k4是常量,比如:k3=0.6,k4=0.4。
步骤610,检测第二计算模块的加权和是否小于第一计算模块的加权和。
若第二计算模块的加权和小于第一计算模块的加权和,则执行步骤611。若第二计算模块的加权和不小于第一计算模块的加权和,执行步骤607,即将第一计算模块确定为目标计算模块。
可选的,当第二计算模块有两种,且其中一种第二计算模块对应的加权和小于第一计算模块的加权和,另一种第二计算模块对应的加权和大于第一计算模块的加权和时,仍然执行步骤611。
可选的,当第二计算模块有两种,且两种第二计算模块对应的第二加权和都大于第一加权和时,将第一计算模块作为目标计算模块。
步骤611,若第二计算模块的加权和小于第一计算模块的加权和,则将具有最小的第二加权和对应的第二计算模块确定为目标计算模块。
步骤612,将计算任务分配给目标计算模块。
目标计算模块用于执行计算任务。
异构系统的调度模块将计算任务分配给目标计算模块,由目标计算模块执行计算任务。
综上所述,本发明实施例中提供的计算任务分配方法,通过根据计算任务的任务类型,预测出计算任务在每种计算模块上的时间开销,并获取每种计算模块的资源占用率,在确定用于执行计算任务的计算模块时,同时考虑计算任务在计算模块的时间开销和计算模块中的资源使用情况,有助于解决在分配计算任务的过程中只考虑计算任务的任务类型,可能会降低整个异构系统的计算效率的问题,达到了综合考虑每种计算模块的时间开销和资源使用情况,提高异构系统的计算效率的效果。
此外,还通过计算第一计算模块的加权和,比较第一计算模块的加权和是否大于预定阈值,在第一加权和大于预定阈值的情况下计算第二计算模块的加权和,根据第一计算模块的加权和第二计算模块的加权和确定目标计算模块,避免异构系统将计算任务分配给不便于处理该计算任务的计算模块,有助于提高异构数据处理的效率。
此外,还通过在计算第二计算模块的加权和之前,检测计算任务是否存在适合调度的计算模块,避免在计算任务不适合调度的情况下再去计算第二加权和,浪费异构系统的计算资源,有助于保证异构系统的计算性能。
在上述各个实施例中,异构系统均需要获取每种计算模块的资源占用率。以计算模块包括CPU类计算模块、GPU类计算模块和FPGA类计算模块为例,获取计算模块的资源占用率的步骤,包括如下三种情况:
一、获取CPU类计算模块的资源占用率,可由如下两个步骤实现,如图7A所示:
步骤701,读取NoC的各个片上路由器的缓存占用率。
本实施例中的CPU类计算模块通过片上网络(英文:Network-On-hip,NoC)实现。由于CPU类计算模块通过NoC实现,因此,获取CPU类计算模块的资源占用率需要获取NoC的资源占用率。
缓存占用率用于表征NoC上的各个片上路由器的通信资源占用率。各个片上路由器的缓存占用率是NoC上的指定CPU周期性地统计得到的。
如图7B所示,在NoC中包括多个节点,每个节点包括一个CPU71和一个路由器72。也即在每个节点中,一个CPU71与一个路由器72连接,每个CPU71对应存储有计算规则和缓存,路由器72实现各个CPU71之间的通信。在NoC中具有一个指定CPU73周期性地统计NoC中,各个与CPU71相连的路由器72的输入输出通道上的缓存占用情况,并将统计到的各个缓存占用情况存储至与指定CPU73连接的路由器的寄存器中,该寄存器的值表示NoC中各个路由器72的缓存占用率。
异构系统中的调度模块定期读取NoC中与指定CPU相连的路由器中的寄存器的值。
步骤702,对缓存占用率求和得到总缓存占用率,将总缓存占用率确定为NoC的资源占用率。
异构系统中的调度模块对读取到的值也即缓存占用率求和,得到NoC的片上路由器的总缓存占用率,将总缓存占用率确定为NoC的资源占用率。
总缓存占用率用于表征NoC上的通信资源占用率。
二、获取FPGA类计算模块的资源占用率。
当异构系统需要分配计算任务给FPGA类计算模块时,会生成与该计算任 务的任务类型对应的配置信息,并发送至FPGA类计算模块;FPGA类计算模块接收到配置信息后,根据配置信息动态调用FPGA类计算模块中的可编程逻辑资源,生成对应的硬件电路。
相应地,异构系统在分配计算任务给FPGA类计算模块之后,会记录当前FPGA类计算模块中的可编程逻辑资源的占用情况。
由于FPGA类计算模块可以包括多个服务器,每个服务器中可以包括多个FPGA,因此获取FPGA类计算模块的资源占用率,视FPGA类计算模块的种类不同,具有如下三种情况:
1、当用于计算该计算任务的计算资源位于同一个FPGA上时,获取该FPGA上的资源占用率,作为FPGA类计算模块的资源占用率。
异构系统的调度模块获取FPGA类计算模块的逻辑资源占用率。
当FPGA类计算模块中的待分配的逻辑资源位于同一个FPGA上时,异构系统的调度模块获取该FPGA上的逻辑资源占用率。
此时,逻辑资源占用率等于该FPGA上被占用的逻辑资源的数量除以该FPGA上的全部逻辑资源的数量。
如图8A所示,黑色长方形44表示被占用的逻辑资源,当待分配的逻辑资源位于同一个FPGA45上时,调度模块40获取FPGA45上的逻辑资源占用率,FPGA45上共有20个逻辑资源,有3个逻辑资源被占用,则调度模块44获取到的资源占用率为3/20。
2、当用于计算该计算任务的计算资源位于不同的FPGA上时,获取每个FPGA的资源占用率和不同的FPGA之间的传输开销,作为FPGA类计算模块的资源占用率。
不同的FPGA之间的传输开销用于表征数据从一个FPGA传输到另一个FPGA时所需要耗费的时长。
异构系统的调度模块在获取FPGA类计算模块的逻辑资源占用率和通信资源占用率。
当FPGA类计算模块中的待分配的逻辑资源位于不同的FPGA上,待分配的逻辑资源且位于同一个服务器时,异构系统的调度模块获取不同的FPGA上的逻辑资源占用率和不同的FPGA之间的传输开销。当异构系统的调度模块获取到不同的FPGA上的逻辑资源占用率和不同的FPGA之间的传输开销后,对不同的FPGA上的逻辑资源占用率和不同的FPGA之间的传输开销进行加权处 理,得到FPGA类计算模块的资源占用率。
可选的,按公式:M=x1t1+x2t2,计算FPGA类计算模块的资源占用率,其中,M为FPGA类计算模块的资源占用率,t1为不同的FPGA上的逻辑资源占用率,x1为不同的FPGA上的逻辑资源占用率对应的权重,t2为不同的FPGA之间的传输开销,x2为不同的FPGA之间的传输开销对应的权值。
如图8B所示,黑色长方形46表示被占用的逻辑资源,当待分配的逻辑资源位于FPGA47和FPGA48上,且待分配的资源位于同一个服务器50中时,调度模块49获取FPGA47和FPGA48的逻辑资源占用率,以及FPGA47和FPGA48之间的传输开销。
3、当用于计算该计算任务的计算资源位于不同的FPGA上且位于不同的服务器时,获取每个FPGA的资源占用率和不同的服务器之间的传输开销,作为FPGA类计算模块的资源占用率。
异构系统的调度模块在获取FPGA类计算模块的逻辑资源占用率和通信资源占用率。
不同的服务器之间的传输开销用于表征数据从一个服务器传输到另一个服务器时所需要耗费的时长。
当FPGA类计算模块中的待分配的逻辑资源位于不同的FPGA上,且待分配的逻辑资源位于不同的服务器时,调度模块获取不同的FPGA上的逻辑资源占用率和不同的服务器之间的传输开销。
当异构系统的调度模块获取到不同的FPGA上的逻辑资源占用率和不同的FPGA之间的传输开销后,对不同的FPGA上的逻辑资源占用率和不同的FPGA之间的传输开销进行加权处理,得到FPGA类计算模块的资源占用率。
可选的,按公式:L=x3t3+x4t4,计算FPGA类计算模块的资源占用率,其中,L为FPGA类计算模块的资源占用率,t3为不同的FPGA上的逻辑资源占用率,x3为不同的FPGA上的逻辑资源占用率对应的权重,t4为不同的服务器之间的传输开销,x4为不同的服务器之间的传输开销对应的权值。
如图8C所示,黑色长方形51表示被占用的逻辑资源,当待调用的逻辑资源位于服务器52中的FPGA43和服务器54中的FPGA55上时,调度模块56获取FPGA53和FPGA55的逻辑资源占用率,以及服务器52和服务器54之间的传输开销。
三、获取GPU类计算模块的资源占用率,可由如下两个步骤实现,如图9 所示:
步骤901,获取GPU类计算模块上的设备队列占用率。
在GPU类计算模块的抽象层中,通过设备队列来对GPU进行调度管理,通过设备队列的占用率得到GPU类计算模块的资源占用率。
设备队列占用率等于设备队列中被占用的资源除以设备队列的全部资源。比如:设备队列共有100个资源,被占用的资源有50个,则设备队列占用率为1/2。
步骤902,将设备队列占用率确定为GPU类计算模块的资源占用率。
在一个示意性的例子中,计算模块为GPU类计算模块、CPU类计算模块和FPGA类计算模块。当异构系统接收到的计算任务D为数据查询操作时,确定计算任务D的任务类型为查询;根据任务类型预测计算任务D在GPU类计算模块、CPU类计算模块和FPGA类计算模块上的时间开销;其中,CPU类计算模块具有最小的时间开销,将CPU类计算模块确定为第一计算模块。根据CPU类计算模块的时间开销和CPU类计算模块的资源占用率计算CPU类计算模块的第一加权和;比较第一加权和与预定阈值之间的大小关系,得到第一加权和大于预定阈值,且检测到计算任务D存在适合调度的第二计算模块:GPU类计算模块和FPGA类计算模块;计算出GPU类计算模块的第二加权和1和FPGA类计算模块的第二加权和2;检测第二加权和是否小于第一加权和。
当检测到第一加权和小于GPU类计算模块的第二加权和1,也小于FPGA类计算模块的第二加权和2时,将CPU类计算模块确定为目标计算模块,由CPU类计算模块执行该计算任务;若检测到第一加权和大于GPU类计算模块的第二加权和1,但第一加权和小于FPGA类计算模块的第二加权和2时,将GPU类计算模块作为目标计算模块,由GPU类计算模块执行该计算任务D。
需要说明的是,计算模块可以是GPU类计算模块、CPU类计算模块和FPGA类计算模块中的任意两个,本领域技术人员可以根据上述实施例组合出该计算任务分配方法的其他实现方式,这里不再赘述。
请参考图10,其示出了本发明一个实施例提供的计算任务分配装置的框图。该计算任务分配装置可以通过软件、硬件或者两者的结合实现成为异构系 统的全部或者一部分。该计算任务分配装置包括:
确定单元1010,用于实现上述步骤301、304、步骤304a、步骤304b、步骤304c中的至少一个步骤的功能,以及其它隐含或公开的确定步骤的功能。
预测单元1020,用于实现上述步骤302的功能,以及其它隐含或公开的预测步骤的功能。
获取单元1030,用于实现上述步骤303的功能,以及其它隐含或公开的获取步骤的功能。
分配单元1040,用于实现上述步骤305中的至少一个步骤的功能,以及其它隐含或公开的分配步骤的功能。
相关细节可结合参考图3所示的方法实施例、或图4A所示的方法实施例、或图4B所示的方法实施例。
需要说明的是,上述确定单元1010可以通过异构系统的调度模块执行存储器中的确定模块实现;上述预测单元1020可以通过异构系统的调度模块执行存储中的预测模块实现;上述获取单元1030可以通过异构系统的调度模块执行存储器中的获取模块实现,上述分配单元1040可以通过异构系统的调度模块执行存储器中的分配模块实现。
请参考图10,其示出了本发明一个实施例提供的计算任务分配装置的框图。该计算任务分配装置可以通过软件、硬件或者两者的结合实现成为异构系统的全部或者一部分。该计算任务分配装置包括:
确定单元1010,用于实现上述步骤501、步骤504至步骤507、步骤601、步骤604至步骤611中的至少一个步骤的功能,以及其它隐含或公开的确定步骤的功能。
预测单元1020,用于实现上述步骤502和步骤602的功能,以及其它隐含或公开的预测步骤的功能。
获取单元1030,用于实现上述步骤503和步骤603的功能,以及其它隐含或公开的获取步骤的功能。
分配单元1040,用于实现上述步骤508和步骤612中的至少一个步骤的功能,以及其它隐含或公开的分配步骤的功能。
相关细节可结合参考图3所示的方法实施例、或图4A所示的方法实施例、或图4B所示的方法实施例。
相关细节可结合参考图5所示的方法实施例或图6所示的方法实施例。
需要说明的是,需要说明的是,上述确定单元1010可以通过异构系统的调度模块执行存储器中的确定模块实现;上述预测单元1020可以通过异构系统的调度模块执行存储中的预测模块实现;上述获取单元1030可以通过异构系统的调度模块执行存储器中的获取模块实现,上述分配单元1040可以通过异构系统的调度模块执行存储器中的分配模块实现。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (18)

  1. 一种异构系统,其特征在于,所述异构系统包括调度模块和n种计算模块,其中,n为大于1的整数;
    所述调度模块,用于根据待分配的计算任务的任务类型,从所述n种计算模块中确定至少两种具有执行所述计算任务的能力的计算模块,预测所述计算任务在所述至少两种计算模块中每种计算模块上执行的时间开销,并获取所述每种计算模块的资源占用率,根据所述时间开销和所述资源占用率从所述至少两种计算模块中确定目标计算模块,并将所述计算任务分配至所述目标计算模块;
    所述目标计算模块用于执行所述计算任务。
  2. 根据权利要求1所述的异构系统,其特征在于,
    所述调度模块,还用于根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算所述每种计算模块的加权和;
    所述调度模块,用于将所述加权和未超过预定阈值且具有最小的所述时间开销的计算模块,确定为所述目标计算模块。
  3. 根据权利要求1所述的异构系统,其特征在于,
    所述调度模块,还用于根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算所述每种计算模块的加权和;
    所述调度模块,用于将具有最小的所述加权和的计算模块,确定为所述目标计算模块。
  4. 根据权利要求2或3所述的异构系统,其特征在于,
    所述调度模块,用于按如下公式计算所述每种计算模块的加权和:
    Y=k1α1+k2α2
    其中,所述Y为所述每种计算模块的加权和,所述α1为所述每种计算模块的资源占用率,所述k1为所述资源占用率对应的权重,所述α2为所述每种计算模块执行所述计算任务的时间开销,所述k2为所述时间开销对应的权值。
  5. 根据权利要求1至4任一所述的异构系统,其特征在于,所述至少两种计算模块包括中央处理器CPU类计算模块、图形处理器GPU类计算模块和现场可编程门阵列FPGA类计算模块中的至少两种。
  6. 根据权利要求1至5任一所述的异构系统,其特征在于,所述资源占用率包括所述计算模块的计算资源占用率和/或通信资源占用率。
  7. 一种计算任务分配方法,其特征在于,应用于包括n种计算模块的异构系统中,n为大于1的整数,所述方法包括:
    根据待分配的计算任务的任务类型,从所述n种计算模块中确定至少两种具有执行所述计算任务的能力的计算模块;
    预测所述计算任务在所述至少两种计算模块中每种计算模块上执行的时间开销,并获取所述每种计算模块的资源占用率;
    根据所述时间开销和所述资源占用率从所述至少两种计算模块中确定目标计算模块;
    将所述计算任务分配至所述目标计算模块,所述目标计算模块用于执行所述计算任务。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述时间开销和所述资源占用率从所述至少两种计算模块中确定目标计算模块,包括:
    根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算得到所述每种计算模块的加权和;
    将所述加权和未超过预定阈值且具有最小的所述时间开销的所述计算模块,确定为所述目标计算模块。
  9. 根据权利要求7所述的方法,其特征在于,所述根据所述时间开销和所述资源占用率从所述至少两种计算模块中确定目标计算模块,包括:
    根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算得到所述计算模块的加权和;
    将具有最小的所述加权和的所述计算模块,确定为所述目标计算模块。
  10. 根据权利要求8或9所述的方法,其特征在于,所述根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算得到所述每种计算模块的加权和,包括:
    按如下公式计算所述每种计算模块的加权和:
    Y=k1α1+k2α2
    其中,所述Y为所述每种计算模块的加权和,所述α1为所述每种计算模块的资源占用率,所述k1为所述资源占用率对应的权重,所述α2为所述每种计算模块执行所述计算任务的时间开销,所述k2为所述时间开销对应的权值。
  11. 根据权利要求7至10任一所述的方法,其特征在于,所述至少两种计算模块包括中央处理器CPU类计算模块、图像处理器GPU类计算模块和现场可编程门阵列FPGA类计算模块中的至少两种。
  12. 根据权利要求7至11任一所述的方法,其特征在于,所述资源占用率包括所述计算模块的计算资源占用率和/或通信资源占用率。
  13. 一种计算任务分配装置,其特征在于,所述装置包括有n种计算模块,n为大于等于2的整数,所述装置包括:
    确定单元,用于根据待分配的计算任务的任务类型,从所述n种计算模块中确定至少两种具有执行所述计算任务的能力的计算模块;
    预测单元,用于预测所述计算任务在所述至少两种计算模块中每种计算模块上执行的时间开销;
    获取单元,还用于获取所述每种计算模块的资源占用率;
    所述确定单元,还用于根据所述时间开销和所述资源占用率从所述至少两种计算模块中确定目标计算模块;
    分配单元,还用于将所述计算任务分配至所述目标计算模块,所述目标计算模块用于执行所述计算任务。
  14. 根据权利要求13所述的装置,其特征在于,
    所述确定单元,用于根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算得到所述每种计算模块的加权和;将所述加权和未 超过预定阈值且具有最小的所述时间开销的所述计算模块,确定为所述目标计算模块。
  15. 根据权利要求13所述的装置,其特征在于,
    所述确定单元,用于根据所述每种计算模块的所述时间开销和所述每种计算模块的资源占用率,计算得到所述每种计算模块的加权和;将具有最小的所述加权和的所述计算模块,确定为所述目标计算模块。
  16. 根据权利要求14或15所述的装置,其特征在于,
    所述确定单元,用于按如下公式计算所述每种计算模块的加权和:
    Y=k1α1+k2α2
    其中,所述Y为所述每种计算模块的加权和,所述α1为所述每种计算模块的资源占用率,所述k1为所述资源占用率对应的权重,所述α2为所述每种计算模块执行所述计算任务的时间开销,所述k2为所述时间开销对应的权值。
  17. 根据权利要求13至16任一所述的装置,其特征在于,所述至少两种计算模块包括中央处理器CPU类计算模块、图形处理器GPU类计算模块和现场可编程门阵列FPGA类计算模块中的至少两种。
  18. 根据权利要求13至17任一所述的装置,其特征在于,所述资源占用率包括所述计算模块的计算资源占用率和/或通信资源占用率。
PCT/CN2016/103585 2016-10-27 2016-10-27 异构系统、计算任务分配方法及装置 WO2018076238A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680056714.1A CN108604193A (zh) 2016-10-27 2016-10-27 异构系统、计算任务分配方法及装置
PCT/CN2016/103585 WO2018076238A1 (zh) 2016-10-27 2016-10-27 异构系统、计算任务分配方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/103585 WO2018076238A1 (zh) 2016-10-27 2016-10-27 异构系统、计算任务分配方法及装置

Publications (1)

Publication Number Publication Date
WO2018076238A1 true WO2018076238A1 (zh) 2018-05-03

Family

ID=62023020

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103585 WO2018076238A1 (zh) 2016-10-27 2016-10-27 异构系统、计算任务分配方法及装置

Country Status (2)

Country Link
CN (1) CN108604193A (zh)
WO (1) WO2018076238A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659112A (zh) * 2018-06-29 2020-01-07 中车株洲电力机车研究所有限公司 算法调度方法及系统
CN110909886A (zh) * 2019-11-20 2020-03-24 北京小米移动软件有限公司 一种机器学习网络运行方法、装置及介质
WO2021136512A1 (zh) * 2020-01-03 2021-07-08 深圳鲲云信息科技有限公司 基于深度学习节点计算的调度方法、设备及存储介质
CN113835852A (zh) * 2021-08-26 2021-12-24 东软医疗系统股份有限公司 任务数据的调度方法及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051064B (zh) * 2019-12-26 2024-05-24 中移(上海)信息通信科技有限公司 任务调度方法、装置、设备及存储介质
CN111783970A (zh) * 2020-06-30 2020-10-16 联想(北京)有限公司 一种数据处理方法及电子设备
CN111866902B (zh) * 2020-07-01 2022-09-27 中国联合网络通信集团有限公司 资源利用率的评估方法和装置
CN112306662A (zh) * 2020-11-11 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 一种多处理单元协同运算装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103197976A (zh) * 2013-04-11 2013-07-10 华为技术有限公司 异构系统的任务处理方法及装置
CN104778080A (zh) * 2014-01-14 2015-07-15 中兴通讯股份有限公司 基于协处理器的作业调度处理方法及装置
CN104849698A (zh) * 2015-05-21 2015-08-19 中国人民解放军海军工程大学 一种基于异构多核系统的雷达信号并行处理方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102171627A (zh) * 2008-10-03 2011-08-31 悉尼大学 对在异构计算系统中执行的应用的调度
CN101739292B (zh) * 2009-12-04 2016-02-10 曙光信息产业(北京)有限公司 基于应用特征的异构集群作业自适应调度方法和系统
CN103645954B (zh) * 2013-11-21 2018-12-14 华为技术有限公司 一种基于异构多核体系的cpu调度方法、装置和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103197976A (zh) * 2013-04-11 2013-07-10 华为技术有限公司 异构系统的任务处理方法及装置
CN104778080A (zh) * 2014-01-14 2015-07-15 中兴通讯股份有限公司 基于协处理器的作业调度处理方法及装置
CN104849698A (zh) * 2015-05-21 2015-08-19 中国人民解放军海军工程大学 一种基于异构多核系统的雷达信号并行处理方法及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659112A (zh) * 2018-06-29 2020-01-07 中车株洲电力机车研究所有限公司 算法调度方法及系统
CN110909886A (zh) * 2019-11-20 2020-03-24 北京小米移动软件有限公司 一种机器学习网络运行方法、装置及介质
CN110909886B (zh) * 2019-11-20 2022-11-04 北京小米移动软件有限公司 一种机器学习网络运行方法、装置及介质
WO2021136512A1 (zh) * 2020-01-03 2021-07-08 深圳鲲云信息科技有限公司 基于深度学习节点计算的调度方法、设备及存储介质
CN113835852A (zh) * 2021-08-26 2021-12-24 东软医疗系统股份有限公司 任务数据的调度方法及装置
CN113835852B (zh) * 2021-08-26 2024-04-12 东软医疗系统股份有限公司 任务数据的调度方法及装置

Also Published As

Publication number Publication date
CN108604193A (zh) 2018-09-28

Similar Documents

Publication Publication Date Title
WO2018076238A1 (zh) 异构系统、计算任务分配方法及装置
Wang et al. Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality
Rahbari et al. Task offloading in mobile fog computing by classification and regression tree
US10289973B2 (en) System and method for analytics-driven SLA management and insight generation in clouds
US9354938B2 (en) Sequential cooperation between map and reduce phases to improve data locality
Xie et al. Pandas: robust locality-aware scheduling with stochastic delay optimality
Fu et al. Layered virtual machine migration algorithm for network resource balancing in cloud computing
US10719366B1 (en) Dynamic and selective hardware acceleration
KR101471749B1 (ko) 클라우드 서비스의 가상자원 할당을 위한 퍼지 로직 기반의 자원평가 장치 및 방법
CN110308984B (zh) 一种用于处理地理分布式数据的跨集群计算系统
Vakilinia et al. Analysis and optimization of big-data stream processing
US20220129460A1 (en) Auto-scaling a query engine for enterprise-level big data workloads
Shen et al. Performance modeling of big data applications in the cloud centers
Shen et al. Performance prediction of parallel computing models to analyze cloud-based big data applications
Maiyama et al. Performance modelling and analysis of an OpenStack IaaS cloud computing platform
Tikhonenko et al. Queueing systems with random volume customers and a sectorized unlimited memory buffer
Li et al. Performance analysis of cloud computing centers serving parallelizable rendering jobs using M/M/c/r queuing systems
Banerjee et al. Priority based K-Erlang distribution method in cloud computing
Yassir et al. Graph-based model and algorithm for minimising big data movement in a cloud environment
Wang et al. Model-based scheduling for stream processing systems
Ismail et al. Modeling and performance analysis to predict the behavior of a divisible load application in a cloud computing environment
Thieme Challenges for modelling of software-based packet processing in commodity-hardware using queueing theory
Yang et al. Yun: a high-performance container management service based on openstack
Manekar et al. Optimizing cost and maximizing profit for multi-cloud-based big data computing by deadline-aware optimize resource allocation
Morla et al. High-performance network traffic analysis for continuous batch intrusion detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16919945

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16919945

Country of ref document: EP

Kind code of ref document: A1