WO2016101115A1 - 一种资源调度方法以及相关装置 - Google Patents

一种资源调度方法以及相关装置 Download PDF

Info

Publication number
WO2016101115A1
WO2016101115A1 PCT/CN2014/094581 CN2014094581W WO2016101115A1 WO 2016101115 A1 WO2016101115 A1 WO 2016101115A1 CN 2014094581 W CN2014094581 W CN 2014094581W WO 2016101115 A1 WO2016101115 A1 WO 2016101115A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
accessed
hotspot
task
hotspot data
Prior art date
Application number
PCT/CN2014/094581
Other languages
English (en)
French (fr)
Inventor
李嘉
刘杰
党李飞
毛凌志
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201480077812.4A priority Critical patent/CN106462360B/zh
Priority to PCT/CN2014/094581 priority patent/WO2016101115A1/zh
Priority to EP14908686.0A priority patent/EP3200083B1/en
Publication of WO2016101115A1 publication Critical patent/WO2016101115A1/zh
Priority to US15/584,661 priority patent/US10430237B2/en
Priority to US16/558,983 priority patent/US11194623B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/123Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3856Reordering of instructions, e.g. using queues or age tags
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Definitions

  • the present invention relates to the field of data processing, and in particular, to a resource scheduling method and related apparatus.
  • a distributed file system is a file management system suitable for big data storage.
  • the physical storage resources in the system are not necessarily connected to the local node, but are connected to multiple nodes through a computer network.
  • large data blocks are divided into multiple small data blocks and stored on multiple nodes, so that the distributed file system has high fault tolerance and throughput.
  • Hadoop Distributed File System (HDFS) is a commonly used highly fault-tolerant file system that can be deployed on inexpensive machines, which is very suitable for large-scale data sets.
  • HDFS high definition data
  • data is aggregated into blocks of data stored on a disk of a data node (DN, DataNode), and an application task can read data from a block of data on a disk or write data to disk.
  • DN data node
  • an application task can read data from a block of data on a disk or write data to disk.
  • the data block In the data block.
  • the analysis task of large-scale data the data in the disk needs to be repeatedly read and written by the application task, resulting in a large amount of time for data input and output (IO), and the task running time is too long.
  • the HDFS at the current stage counts the number of historical accesses of each data block in the preset time period, and then the number of times the history is accessed. More data blocks are determined as hot data blocks and moved into the memory of the DN, so that the application task can directly access the hot data block through the memory of the DN, thereby improving the data IO efficiency.
  • the history of the data block is not accurately reflected in the hotspot of the data block. Even if the history of the data block is visited many times, the number of times the data block is accessed after being moved into the memory may be small. In this case, if the data block is moved into the memory, not only the data IO efficiency cannot be significantly improved, but also unnecessary memory resources are wasted.
  • the embodiment of the invention provides a resource scheduling method, which can improve the data IO efficiency of the system.
  • a first aspect of the embodiments of the present invention provides a resource scheduling method, including:
  • the move-in instruction is used to indicate that the hotspot data block is moved into the memory, so that the hotspot data block can be accessed in the memory.
  • the determining, according to the number of times that each data block is to be accessed by the application task, determining that the hotspot data block includes :
  • the data block to be accessed by the application task is not less than N, and is determined to be a hot data block, and the N is a preset value.
  • the sending, by the local node of the hotspot data block, a move-in instruction It also includes:
  • the application task corresponding to the hotspot data block is scheduled to the local node of the hotspot data block.
  • the sending, by the local node of the hotspot data block, a move-in instruction It also includes:
  • the application tasks corresponding to each of the hotspot data blocks are sequentially executed in an order of the number of times the application tasks are to be accessed.
  • the fourth implementation manner of the first aspect of the embodiment of the present invention further includes:
  • the method further includes:
  • Each of the application tasks is executed in sequence in descending order of the number of hotspot data blocks to be accessed.
  • the fifth implementation manner of the first aspect of the embodiment of the present invention further includes:
  • the P is a preset value, or the number of times to be accessed by the application task is not greater than Q
  • the in-memory data block is determined to be a non-hotspot data block, and the Q is a preset value;
  • the move-out instruction is used to indicate that the non-hotspot data block is removed from the memory.
  • the task queue includes:
  • a second aspect of the embodiments of the present invention provides a resource scheduling apparatus, including:
  • a task queue determining module configured to determine a current task queue, where the task queue includes multiple application tasks to be executed
  • a first number determining module configured to determine, in a data block in a disk to be accessed by the application task, a number of times each data block is to be accessed by the application task;
  • a hotspot data determining module configured to determine a hotspot data block according to the number of times each of the data blocks is to be accessed by the application task
  • And shifting into the instruction sending module configured to send a move-in instruction to the local node of the hotspot data block, where the move-in instruction is used to instruct the hotspot data block to be moved into the memory, so that the hotspot data block can be accessed in the memory.
  • the hotspot data determining module is specifically configured to:
  • the data block to be accessed by the application task is not less than N, and is determined to be a hot data block, and the N is a preset value.
  • the device further includes:
  • a task node scheduling module configured to: after the move-in instruction sending module sends a move-in instruction to a local node of the hot-spot data block, when the local node of the hot-spot data block currently has an idle slot, the hot spot The application task corresponding to the data block is scheduled to the local node of the hotspot data block.
  • the device further includes:
  • a first sequence scheduling module configured to execute each of the inbound order sending modules in order to be sent by the local node of the hotspot data block according to the order of the number of times to be accessed by the application task, in descending order The application task corresponding to the hotspot data block.
  • the apparatus further includes:
  • a number of access determination module configured to determine a number of hotspot data blocks to be accessed by each of the application tasks
  • the device also includes:
  • a second sequence scheduling module configured to execute each of the hot-spot data blocks to be accessed in order of increasing number of hot-spot data blocks to be accessed after the move-in instruction sending module sends the move-in instruction to the local node of the hot-spot data block The application task.
  • the device is further include:
  • a second number determining module configured to determine a number of times that each data block in the memory is to be accessed by the application task
  • a non-hotspot data determining module configured to determine a data block in the first P memory to be accessed by the application task as a non-hotspot data block, where the P is a preset value, or is to be described
  • the data block in the memory whose application task access is not greater than Q is determined to be a non-hotspot data block, and the Q is a preset value;
  • And removing the instruction sending module configured to send a move-out instruction to the local node of the non-hotspot data block, where the move-out instruction is used to indicate that the non-hotspot data block is removed from the memory.
  • the task queue The determination module includes:
  • An instruction receiving unit configured to receive an execution instruction of a work to be executed delivered by the client within a preset time period
  • a task dividing unit configured to divide the work to be executed into a plurality of application tasks to be executed, and determine the set of the plurality of application tasks to be executed as a current task queue.
  • a third aspect of the embodiments of the present invention provides a resource scheduling apparatus, including an input device, an output device, a processor, and a memory, wherein the memory storage operation instruction is invoked, and the processor is configured to perform the following steps:
  • the move-in instruction is used to indicate that the hotspot data block is moved into the memory, so that the hotspot data block can be accessed in the memory.
  • the processor further performs the following steps:
  • the data block to be accessed by the application task is not less than N, and is determined to be a hot data block, and the N is a preset value.
  • the processor further performs the following steps:
  • Dispatching an application task corresponding to the hotspot data block to the hotspot data block if the local node of the hotspot data block currently has a free slot slot, after sending the move-in instruction to the local node of the hotspot data block On the local node.
  • the processor further performs the following steps:
  • the application task corresponding to each of the hot-spot data blocks is sequentially executed according to the order of the number of times to be accessed by the application task.
  • the processor further performs the following steps:
  • the method further includes:
  • Each of the application tasks is executed in sequence in descending order of the number of hotspot data blocks to be accessed.
  • the processor further performs The following steps:
  • the P is a preset value, or the number of times to be accessed by the application task is not greater than Q
  • the in-memory data block is determined to be a non-hotspot data block, and the Q is a preset value;
  • the move-out instruction is used to indicate that the non-hotspot data block is removed from the memory.
  • the processor further performs The following steps:
  • An embodiment of the present invention provides a resource scheduling method, including: determining a current task queue, where the task queue includes multiple application tasks to be executed; determining each data block in a data block in a disk to be accessed by the application task. The number of times to be accessed by the application task; determining the hotspot data block according to the number of times each data block is to be accessed by the application task; and sending a move-in instruction to the local node of the hotspot data block, the move-in instruction is used to indicate that the hotspot data block is moved in The memory is such that the hotspot data block can be accessed in memory.
  • the embodiment of the present invention determines the hotspot degree of the data block by using the to-be-executed application task in the current task queue, and ensures that the determined hotspot data block is accessed more frequently after being moved into the memory.
  • the resource scheduling method provided by the embodiment of the present invention can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resources, as compared with the prior art determining the hotspot data block according to the number of historically accessed times. waste.
  • FIG. 1 is a flowchart of an embodiment of a resource scheduling method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another embodiment of a resource scheduling method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of another embodiment of a resource scheduling method according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of another embodiment of a resource scheduling method according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of an embodiment of a resource scheduling apparatus according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of another embodiment of a resource scheduling apparatus according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of another embodiment of a resource scheduling apparatus according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of another embodiment of a resource scheduling apparatus according to an embodiment of the present invention.
  • FIG. 9 is a flowchart of another embodiment of a resource scheduling apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a resource scheduling method, which can improve data IO efficiency.
  • the present invention also proposes related resource scheduling devices, which will be separately described below.
  • FIG. 1 For the basic process of the resource scheduling method provided by the embodiment of the present invention, refer to FIG. 1 , which mainly includes:
  • the resource scheduling device determines a current task queue, which includes a plurality of application tasks to be executed.
  • the application task in the task queue needs to access the data block located in the disk.
  • the resource scheduling device determines the number of times each data block is to be accessed by the application task in the data block in the disk to be accessed by the application task of the task queue. .
  • the resource scheduling device determines the hotspot data block according to the number of times each data block is to be accessed by the application task.
  • the hotspot data block is a data block in the disk, and the data block to be accessed is many times. There are many methods for determining the data block, which will be described in detail in the following embodiments, which are not limited herein.
  • the resource scheduling device After determining the hotspot data block, the resource scheduling device sends a move-in instruction to the local node of the hotspot data block, where the move-in instruction is used to indicate that the local node of the hotspot data block moves the hotspot data block from the disk into the memory, so that the hotspot data block can be in the memory.
  • the local node of the hot data block refers to the node where the hot data block is located.
  • the local node of the hotspot data block preferentially moves the hotspot data block from the disk into the local memory.
  • the embodiment provides a resource scheduling method, including: determining a current task queue; determining a number of times each data block is to be accessed by an application task in a data block in a disk to be accessed by the application task; Determining a hotspot data block by the number of times the application task is accessed; sending a move-in instruction to the local node of the hotspot data block, the move-in instruction is used to instruct the hotspot data block to be moved into the memory, so that the hotspot data block can be accessed in the memory .
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling method provided by this embodiment can significantly increase the number of hotspot data blocks according to the number of times the history is accessed in the prior art. According to the IO efficiency, the running time of the application task is shortened, and unnecessary memory resources are not wasted.
  • FIG. 1 provides a basic flow of a resource scheduling method according to an embodiment of the present invention, wherein a resource scheduling apparatus determines a hotspot degree of a data block by using an application task to be executed in a current task queue, and determines the hotspot of the data block.
  • a resource scheduling apparatus determines a hotspot degree of a data block by using an application task to be executed in a current task queue, and determines the hotspot of the data block.
  • the method for determining, by the resource scheduling device, the hotspot data block according to the number of times that each data block is to be accessed by the application task in step 103 may be: the highest number of times to be accessed by the application task The first M data blocks are determined as hotspot data blocks; or the data blocks whose number of times to be accessed by the application task is not less than N are determined as hotspot data blocks. M and N are preset values.
  • the resource scheduling apparatus may determine the hotspot data block according to the number of times that each data block is to be accessed by the application task by using other methods, which is not limited herein.
  • the resource scheduling apparatus may also determine whether the local node of the hotspot data block currently has a free slot. If it is determined that the local node of the hotspot data block currently has an idle slot, the application task corresponding to the hotspot data block is scheduled to the local node of the hotspot data block, so that the application task does not need to access the hotspot data block across nodes, thereby improving the system data. IO efficiency.
  • FIG. 1 explains in detail how the resource scheduling method provided by the present invention determines and schedules hotspot data blocks. Another resource scheduling method will be provided below, which can be further implemented on the basis of the embodiment shown in FIG. Scheduling the work tasks corresponding to the hot data block, see Figure 2.
  • the basic process includes:
  • Steps 201 to 204 are substantially the same as steps 101 to 104, and are not described herein.
  • the resource scheduling device After the resource scheduling device sends the move-in instruction to the local node of the hotspot data block, the local node of the hotspot data block moves the hotspot data block into the memory.
  • the application tasks in the task queue can be directly from Access hotspot data blocks in memory.
  • the resource scheduling apparatus is further configured to schedule the execution order of the application tasks in the work queue, and the specific method is: sequentially executing each hot data block according to the order of the number of times the task is to be accessed by the application. The corresponding application task.
  • the embodiment provides a resource scheduling method, including: determining a current task queue; determining a number of times each data block is to be accessed by an application task in a data block in a disk to be accessed by the application task; Determining a hotspot data block by the number of times the application task is accessed; sending a move-in instruction to the local node of the hotspot data block, the move-in instruction is used to instruct the hotspot data block to be moved into the memory, so that the hotspot data block can be accessed in the memory
  • the application tasks corresponding to each hotspot data block are sequentially executed according to the order of the number of accesses to be applied by the application task.
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling method provided by the embodiment can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste.
  • the application tasks corresponding to each hotspot data block are sequentially executed according to the order of the number of accesses to be applied by the application task, so that the task corresponding to the data block with higher hotspot degree can be executed first, and the work queue is optimized. The execution order of the application tasks improves the efficiency of the system to perform application tasks.
  • the resource scheduling device can also schedule the execution order of the application tasks in the work queue by other methods.
  • the method further includes the step of: the resource scheduling device determining the number of hotspot data blocks to be accessed by each application task.
  • Step 205 can be replaced by executing each application task in turn in descending order of the number of hotspot data blocks to be accessed. In this way, the application task that accesses the hot data block is frequently executed first, and the execution order of the application task in the work queue can be optimized, thereby improving the efficiency of the system executing the application task.
  • the method provided by the embodiment shown in FIG. 2 gives a method of optimizing the execution order of application tasks in the work queue after moving the hotspot data blocks into the memory.
  • the number of data blocks that can be accommodated in memory is limited.
  • the hotspot data block is continuously moved into the memory, and when the number of data blocks moved into the memory reaches the upper limit that the memory can accommodate, the memory has no ability to accept the new one. Hotspot data block.
  • the resource scheduling method can continue to run, and it is necessary to ensure that the memory has enough space to accommodate new data blocks.
  • the present invention provides a new embodiment for ensuring that a memory of a node has sufficient space to accommodate a new data block. Referring to FIG. 3, another basic resource scheduling method provided by an embodiment of the present invention is provided. The process includes:
  • Steps 301 to 305 are substantially the same as steps 201 to 205, and are not described herein.
  • the number of data blocks that can be held by memory is limited.
  • the hotspot data block is continuously moved into the memory, and when the number of data blocks moved into the memory reaches the upper limit that the memory can accommodate, the memory has no ability to accept the new one. Hotspot data block.
  • the resource scheduling apparatus determines, according to the task queue, which data blocks are located in the memory, which data blocks are to be accessed less frequently, and moves the data blocks to be accessed less frequently out of the memory. For specific methods, refer to steps 306 to 308.
  • the resource scheduling device determines the number of times that each data block is to be accessed by an application task in the task queue in a block of data in memory.
  • the resource scheduling device determines, according to the number of times the data block in each memory is to be accessed by the application task, the non-hotspot data block is used to represent the data block located in the memory, and the number of times to be accessed by the application task is less. Data block.
  • the data blocks in the first P memory that are to be accessed by the application task are determined to be non-hotspot data blocks, where P is a preset value, or will be applied.
  • the number of task accesses is not greater than the data block in Q's memory. It is defined as a non-hotspot data block, where Q is a preset value.
  • the resource scheduling device may also determine non-hotspot data blocks by other methods, which is not limited herein.
  • the resource scheduling device After determining the non-hotspot data block, the resource scheduling device sends a move-out instruction to the local node of the non-spot data block, where the move-out instruction is used to indicate that the local node of the non-hotspot data block moves the non-hotspot data block from the memory to the disk.
  • the embodiment provides a resource scheduling method, including: determining a current task queue; determining a number of times each data block is to be accessed by an application task in a data block in a disk to be accessed by the application task; Determining a hotspot data block by the number of times the application task is accessed; sending a move-in instruction to the local node of the hotspot data block, the move-in instruction is used to instruct the hotspot data block to be moved into the memory, so that the hotspot data block can be accessed in the memory
  • the application tasks corresponding to each hotspot data block are sequentially executed; the number of times each data block in the memory is to be accessed by the application task is determined; according to each memory The number of times the data block is to be accessed by the application task determines a non-hotspot data block; and sends a removal instruction to the local node of the non-hotspot data block, indicating that the local node of the
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling method provided by the embodiment can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste.
  • the application tasks corresponding to each hotspot data block are sequentially executed according to the order of the number of accesses to be applied by the application task, so that the task corresponding to the data block with higher hotspot degree can be executed first, and the work queue is optimized. The execution order of the application tasks improves the efficiency of the system to perform application tasks.
  • the resource scheduling device also determines the non-hotspot data block in the memory, and indicates that the local node of the non-hotspot data block removes the non-hotspot data block from the memory, so that the data blocks stored in the memory are all hotspots.
  • the database implements dynamic optimization of hotspot data blocks in memory.
  • the resource scheduling device receives an execution instruction of a job to be executed (a job) issued by the client, and the execution instruction is used to instruct the resource scheduling device to perform a work to be performed.
  • the preset time period may be an artificially set time period, or may be a default time period of the resource scheduling device, or may be other time segments, which is not limited herein.
  • the work to be performed is divided into multiple application tasks to be executed, and the set of the plurality of application tasks to be executed is determined as the current task queue.
  • each work to be performed can be divided into one or more application tasks to be executed, and each application task to be executed needs to access one data block.
  • a work to be performed needs to access a data file of a size of 128 M, and the size of each data block in the distributed file system is 32 M, so that the resource scheduling device divides the work to be performed into four application tasks to be executed.
  • Each application task to be executed is used to access a 32M data block.
  • the work to be performed in this embodiment may include only one job, but more preferably, the work to be performed may include multiple jobs, and the resource scheduling device divides the multiple jobs into multiple application tasks to be executed, and the A set of multiple application tasks to be executed is determined as the current task queue.
  • Steps 403 to 409 are substantially the same as steps 302 to 308, and are not described herein.
  • the embodiment provides a resource scheduling method, including: receiving an execution instruction of a work to be executed delivered by a client within a preset time period; dividing the work to be executed into multiple application tasks to be executed, and The set of the plurality of application tasks to be executed is determined as the current task queue; The number of times each data block is to be accessed by the application task in the data block in the disk to be accessed by the service; determining the hotspot data block according to the number of times each data block is to be accessed by the application task; sending the move in to the local node of the hotspot data block And the instruction to move the hotspot data block into the memory, so that the hotspot data block can be accessed in the memory; each hotspot is sequentially executed according to the order of the number of times the task is to be accessed by the application.
  • the application task corresponding to the data block; determining the number of times the data block in each memory is to be accessed by the application task; determining the non-hotspot data block according to the number of times the data block in each memory is to be accessed by the application task; and the non-hotspot data block
  • the local node sends a move-out instruction to instruct the local node of the non-hotspot data block to move the non-hotspot data block out of memory to the disk.
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling method provided by the embodiment can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste.
  • the resource scheduling device also schedules the execution order of the application tasks in the work queue, so that the task corresponding to the hotspot data block with more times to be accessed by the application task is preferentially executed.
  • the resource scheduling apparatus further determines the non-hotspot data block in the memory, and indicates that the local node of the non-hotspot data block removes the non-hotspot data block from the memory, so that the data blocks stored in the memory are all hotspots.
  • the database implements dynamic optimization of hot data blocks in memory.
  • the resource scheduling device receives an execution instruction of the work to be executed delivered by the client, and the execution instruction is used to instruct the resource scheduling device to perform the work to be performed, the work to be performed Need to access 128M size data files.
  • each data block in the distributed file system is 32M, so the resource scheduling device divides the work to be performed into four application tasks to be executed, and each application task to be executed is used to access a 32M data block.
  • the resource scheduling device determines the set of four application tasks to be executed as the current task queue.
  • the resource scheduling device determines the number of times each data block is to be accessed by the application task in the data block in the disk to be accessed by the application task of the task queue. Get a total of 100 data blocks, 20 of which are The block was accessed 300 times, 30 data blocks were accessed 200 times, and 50 data blocks were accessed 100 times.
  • the resource scheduling device determines, as a hotspot data block, a data block whose number of times to be accessed by the application task is not less than 150, that is, 20 data blocks that have been accessed 300 times and 30 data blocks that have been accessed 200 times are determined as hotspots. data block.
  • the resource scheduling device sends a move-in instruction to the local node of the hotspot data block. After receiving the move-in instruction, the local node of the hotspot data block moves the hotspot data block from the disk into the memory, so that the hotspot data block can be accessed in the memory.
  • the resource scheduling device executes the application task corresponding to the 20 data blocks that have been accessed 300 times in the order of the number of times the application task is to be accessed, and then executes the application corresponding to the 20 data blocks that have been accessed 300 times. task.
  • the resource scheduling device After the resource scheduling device executes the application task corresponding to the hotspot data block, the resource scheduling device determines the number of times each data block is to be accessed by the application task in the task queue. The number of times that 30 data blocks are accessed by the application task in the task queue is 100 times, and the number of 30 data blocks accessed by the application task in the task queue is 160 times.
  • the resource scheduling device determines, in the in-memory data block that is to be accessed by the application task, not more than 150, as a non-hotspot data block, that is, determines 30 data blocks that are accessed 100 times by the application task in the task queue as Non-hotspot data block.
  • the resource scheduling device After determining the non-hotspot data block, the resource scheduling device sends a move-out instruction to the local node of the non-point data block, and the local node of the non-hotspot data block removes the non-hotspot data block from the memory to the disk after receiving the move-out instruction.
  • the embodiment of the present invention further provides a resource scheduling apparatus, which is used to implement the method provided by the embodiment shown in FIG. 1 to FIG.
  • a task queue determining module 501 configured to determine a current task queue
  • the task queue determination module 501 determines a current task queue that includes a plurality of application tasks to be executed.
  • the first number determining module 502 is configured to determine, in the data block in the disk to be accessed by the application task, the number of times each data block is to be accessed by the application task;
  • the application task in the task queue needs to access the data block located in the disk.
  • the first The number determination module 502 determines the number of times each data block is to be accessed by the application task in the data blocks in the disk to be accessed by the application task of the task queue.
  • the hotspot data determining module 503 is configured to determine a hotspot data block according to the number of times each data block is to be accessed by the application task;
  • the hotspot data determining module 503 determines the hotspot data block according to the number of times each data block is to be accessed by the application task.
  • the hotspot data block is a data block in the disk, and the data block to be accessed is many times. There are many methods for determining the data block, which will be described in detail in the following embodiments, which are not limited herein.
  • the shifting instruction sending module 504 is configured to send a move in instruction to the local node of the hotspot data block.
  • the move-in instruction sending module 504 sends a move-in instruction to the local node of the hotspot data block, where the move-in instruction is used to indicate that the local node of the hotspot data block moves the hotspot data block from the disk into the memory.
  • Make hot data blocks accessible in memory The local node of the hot data block refers to the node where the hot data block is located.
  • the local node of the hotspot data block preferentially moves the hotspot data block from the disk into the local memory.
  • the embodiment provides a resource scheduling method, including: the task queue determining module 501 determines a current task queue; the first number determining module 502 determines, in the data block in the disk to be accessed by the application task, each data block is to be The number of times the application task is accessed; the hotspot data determining module 503 determines the hotspot data block according to the number of times each data block is to be accessed by the application task; the move-in instruction sending module 504 sends a move-in instruction to the local node of the hotspot data block, and the move-in instruction is used for Instructing to move the hotspot data block into memory such that the hotspot data block can be accessed in memory.
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling apparatus provided by the embodiment can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste. .
  • the embodiment shown in FIG. 1 shows the basic structure of the resource scheduling apparatus provided by the embodiment of the present invention, wherein the hotspot data determining module 503 determines the hotspot degree of the data block by using the application task to be executed in the current task queue.
  • the hotspot data determining module 503 may be specifically configured to: determine the top M data blocks to be accessed by the application task as the hotspot data block; or to be used by the application. The number of task visits is not less than N
  • the data block is determined to be a hot data block. M and N are preset values.
  • the hotspot data determining module 503 can also determine the hotspot data block according to the number of times that each data block is to be accessed by the application task by other methods, which is not limited herein.
  • the resource scheduling apparatus may further include a task node scheduling module, configured to be a local node of the hotspot data block after the move-in instruction sending module 504 sends the move-in instruction to the local node of the hot-spot data block.
  • a task node scheduling module configured to be a local node of the hotspot data block after the move-in instruction sending module 504 sends the move-in instruction to the local node of the hot-spot data block.
  • the application task corresponding to the hotspot data block is scheduled to the local node of the hotspot data block, so that the application task does not need to access the hotspot data block across nodes, thereby improving the data IO efficiency of the system.
  • FIG. 5 explains in detail how the resource scheduling apparatus provided by the present invention determines and schedules hotspot data blocks. Another resource scheduling apparatus will be provided below, which can be further implemented on the basis of the embodiment shown in FIG. 5. Scheduling the work tasks corresponding to the hot data block, see Figure 6. Its basic structure includes:
  • the first number determining module 602 is configured to determine, in the data block in the disk to be accessed by the application task, the number of times each data block is to be accessed by the application task;
  • the hotspot data determining module 603 is configured to determine a hotspot data block according to the number of times each data block is to be accessed by the application task;
  • the modules 601 to 604 are substantially the same as the modules 501 to 504, and are not described herein.
  • the first sequence scheduling module 605 is configured to perform, after the move-in instruction sending module 604 sends the move-in instruction to the local node of the hot-spot data block, sequentially perform the corresponding hot-spot data block according to the order of the number of times the task task is to be accessed. Application tasks.
  • the move-in instruction sending module 604 sends the move-in instruction to the local node of the hot-spot data block
  • the local node of the hot-spot data block moves the hot-spot data block into the memory.
  • application tasks in the task queue can access hotspot data blocks directly from memory.
  • the resource scheduling apparatus is further configured to schedule the execution order of the application tasks in the work queue.
  • the specific method is: the first sequence scheduling module 605 sequentially follows the order of the number of times the task is to be accessed by the application task. Execute the application task corresponding to each hot data block.
  • This embodiment provides a resource scheduling apparatus, including: a task queue determining module 601 determines when The first task determination module 602 determines the number of times each data block is to be accessed by the application task in the data block in the disk to be accessed by the application task; the hotspot data determination module 603 is to be applied according to each data block.
  • the number of times the task is accessed, the hotspot data block is determined; the move-in instruction sending module 604 sends a move-in instruction to the local node of the hot-spot data block, the move-in instruction is used to instruct the hotspot data block to be moved into the memory, so that the hotspot data block can be in the memory
  • the first sequence scheduling module 605 sequentially executes the application tasks corresponding to each hotspot data block in descending order of the number of times the application task is to be accessed.
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling apparatus can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste.
  • the application tasks corresponding to each hotspot data block are sequentially executed according to the order of the number of accesses to be applied by the application task, so that the task corresponding to the data block with higher hotspot degree can be executed first, and the work queue is optimized.
  • the execution order of the application tasks improves the efficiency of the system to perform application tasks.
  • the resource scheduling device can also schedule the execution order of the application tasks in the work queue by other methods.
  • the resource scheduling apparatus may further include a visit number determining module, configured to determine the number of hotspot data blocks to be accessed by each application task.
  • the first sequence scheduling module 605 can be replaced with a second sequential scheduling module for, according to the number of hotspot data blocks to be accessed, after the move-in instruction sending module 604 sends the move-in instruction to the local node of the hotspot data block.
  • the order in which each application task is executed in turn. In this way, the application task that accesses the hot data block is frequently executed first, and the execution order of the application task in the work queue can be optimized, thereby improving the efficiency of the system executing the application task.
  • the apparatus provided in the embodiment shown in FIG. 6 is capable of optimizing the execution order of application tasks in the work queue after moving the hotspot data blocks into the memory.
  • the number of data blocks that can be accommodated in memory is limited.
  • the hotspot data block is continuously moved into the memory, and when the number of data blocks moved into the memory reaches the upper limit that the memory can accommodate, the memory has no ability to accept the new one. Hotspot data block.
  • the present invention provides a new embodiment for ensuring that the memory of the node has sufficient space to accommodate the new For the data block, please refer to FIG. 7.
  • the basic structure of another resource scheduling apparatus according to an embodiment of the present invention includes:
  • a task queue determining module 701, configured to determine a current task queue
  • the first number determining module 702 is configured to determine, in the data block in the disk to be accessed by the application task, the number of times each data block is to be accessed by the application task;
  • the hotspot data determining module 703 is configured to determine a hotspot data block according to the number of times each data block is to be accessed by the application task;
  • the instruction sending module 704 configured to send a move-in instruction to the local node of the hotspot data block
  • the first sequence scheduling module 705 is configured to perform, after the move-in command sending module 704 sends the move-in instruction to the local node of the hot-spot data block, sequentially perform the corresponding hot-spot data block according to the order of the number of times the task task is to be accessed. Application tasks.
  • the modules 701 to 705 are substantially the same as the modules 601 to 605, and are not described herein.
  • a second number determining module 706, configured to determine a number of times that each data block in the memory is to be accessed by the application task
  • the second number determination module 706 determines the number of times in the data block in memory that each data block is to be accessed by an application task in the task queue.
  • the non-hotspot data determining module 707 is configured to determine, according to the number of times the data block in each memory is to be accessed by the application task, the non-hotspot data block;
  • the non-hotspot data determining module 707 determines a non-hotspot data block according to the number of times the data block in each memory is to be accessed by the application task, and the non-hotspot data block is used to represent the data block located in the memory to be accessed by the application task. A data block with a small number of times. There are many methods for confirming non-hotspot data blocks. For example, the data blocks in the first P memory that are to be accessed by the application task are determined to be non-hotspot data blocks, where P is a preset value, or will be applied. The data block in the memory whose task access is not greater than Q is determined to be a non-hotspot data block, where Q is a preset value.
  • the non-hotspot data determining module 707 can also determine the non-hotspot data block by other methods, which is not limited herein.
  • the move-out instruction sending module 708 is configured to send a move-out instruction to the local node of the non-hotspot data block.
  • the move-out instruction sending module 708 sends a move-out instruction to the local node of the non-spot data block, where the move-out instruction is used to indicate that the local node of the non-hotspot data block takes the non-hotspot data block from Move out of memory to disk.
  • the embodiment provides a resource scheduling apparatus, including: the task queue determining module 701 determines a current task queue; the first number determining module 702 determines, in the data block in the disk to be accessed by the application task, each data block is to be The number of times the application task is accessed; the hotspot data determining module 703 determines the hotspot data block according to the number of times each data block is to be accessed by the application task; the move-in instruction sending module 704 sends a move-in instruction to the local node of the hotspot data block, and the move-in instruction is used for Instructing to move the hotspot data block into the memory, so that the hotspot data block can be accessed in the memory; the first sequential scheduling module 705 sequentially executes each hotspot data according to the order of the number of times the application task is to be accessed in descending order.
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling apparatus provided by the embodiment can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste.
  • the application tasks corresponding to each hotspot data block are sequentially executed according to the order of the number of accesses to be applied by the application task, so that the task corresponding to the data block with higher hotspot degree can be executed first, and the work queue is optimized. The execution order of the application tasks improves the efficiency of the system to perform application tasks.
  • the non-hotspot data determining module 707 also determines non-hotspot data blocks in the memory, and the move-out instruction sending module 708 instructs the local node of the non-hotspot data block to remove the non-hotspot data blocks from the memory, so that the data blocks stored in the memory are hotspots.
  • a higher degree database enables dynamic optimization of hot data blocks in memory.
  • the resource scheduling apparatus provided by the embodiment shown in FIG. 5 to FIG. 7 determines hotspot data according to the work queue.
  • a more detailed resource scheduling apparatus is provided below, which explains in detail how to determine the working queue.
  • Figure 8 including:
  • the task queue determining module 801 is configured to determine a current task queue.
  • the task queue determining module 801 specifically includes:
  • the instruction receiving unit 8011 is configured to receive an execution instruction of the work to be executed delivered by the client within the preset time period
  • the instruction receiving unit 8011 receives an execution instruction of a job to be executed issued by the client in a preset time period, and the execution instruction is used to instruct the resource scheduling apparatus to perform a work to be performed.
  • the preset time period may be an artificially set time period, or may be a default time period of the resource scheduling device, or may be other time segments, which is not limited herein.
  • the task dividing unit 8012 is configured to divide the work to be performed into a plurality of application tasks to be executed, and determine the set of the plurality of application tasks to be executed as the current task queue;
  • each work to be performed can be divided into one or more application tasks to be executed, and each application task to be executed needs to access one data block.
  • a work to be performed needs to access a data file of a size of 128 M, and the size of each data block in the distributed file system is 32 M, so that the resource scheduling device divides the work to be performed into four application tasks to be executed.
  • Each application task to be executed is used to access a 32M data block.
  • the work to be performed in this embodiment may include only one job, but more preferably, the work to be performed may include multiple jobs, and the task dividing unit 8012 divides the multiple jobs into multiple application tasks to be executed, and The set of the plurality of application tasks to be executed is determined as the current task queue.
  • the first number determining module 802 is configured to determine, in the data block in the disk to be accessed by the application task, the number of times each data block is to be accessed by the application task;
  • the hotspot data determining module 803 is configured to determine a hotspot data block according to the number of times each data block is to be accessed by the application task;
  • the instruction sending module 804 configured to send a move-in instruction to the local node of the hotspot data block
  • the first sequence scheduling module 805 is configured to perform, after the move-in instruction sending module 804 sends the move-in instruction to the local node of the hot-spot data block, sequentially perform the corresponding hot-spot data block according to the order of the number of times the task to be accessed by the application is in descending order. Application tasks.
  • a second number determining module 806, configured to determine a number of times that each data block in the memory is to be accessed by the application task
  • the non-hotspot data determining module 807 is configured to determine, according to the number of times the data block in each memory is to be accessed by the application task, the non-hotspot data block;
  • the move-out instruction sending module 808 is configured to send a move-out instruction to the local node of the non-hotspot data block.
  • Modules 802 through 808 are substantially identical to modules 702 through 708 and are not described herein.
  • the embodiment provides a resource scheduling apparatus, including: the instruction receiving unit 8011 receives the preset The execution instruction of the work to be executed delivered by the client in the time period; the task dividing unit 8012 divides the work to be executed into a plurality of application tasks to be executed, and determines the set of the plurality of application tasks to be executed as The current task queue; the first number determining module 802 determines the number of times each data block is to be accessed by the application task in the data block in the disk to be accessed by the application task; the hotspot data determining module 803 is to be applied according to each data block.
  • the number of times the task is accessed, the hotspot data block is determined; the move-in instruction sending module 804 sends a move-in instruction to the local node of the hot-spot data block, the move-in instruction is used to instruct the hotspot data block to be moved into the memory, so that the hotspot data block can be in the memory
  • the first sequence scheduling module 805 sequentially executes the application tasks corresponding to each hotspot data block according to the order of the number of times the application task is to be accessed, and the second number determining module 806 determines the data in each memory.
  • the non-hotspot data block is determined by the number of times the task is accessed; the move-out instruction sending module 808 sends a move-out instruction to the local node of the non-hotspot data block, indicating that the local node of the non-hotspot data block moves the non-hotspot data block from the memory to the disk.
  • the hotspot of the data block is determined by the application task to be executed in the current task queue, and the number of times the determined hotspot data block is accessed by the application task to be executed after being moved into the memory is ensured.
  • the resource scheduling apparatus can significantly improve the data IO efficiency, thereby shortening the running time of the application task without causing unnecessary memory resource waste.
  • the first sequence scheduling module 805 also schedules the execution order of the application tasks in the work queue, so that the tasks corresponding to the hotspot data blocks that are to be accessed by the application task are preferentially executed.
  • the non-hotspot data determining module 807 also determines the non-hotspot data block in the memory, and the move-out instruction sending module 808 instructs the local node of the non-hotspot data block to remove the non-hotspot data block from the memory, so that the data stored in the memory is saved.
  • Blocks are databases with high hotspots, which realize dynamic optimization of hotspot data blocks in memory.
  • the instruction receiving unit 8011 receives an execution instruction of the work to be executed delivered by the client in the preset time period, where the execution instruction is used to instruct the resource scheduling device to perform the work to be performed, the to-be-executed Work requires access to a 128M size data file.
  • each data block in the distributed file system is 32M, so the task dividing unit 8012 will The work to be performed is divided into four application tasks to be executed, and each application task to be executed is used to access a 32M data block.
  • the task dividing unit 8012 determines the four application task sets to be executed as the current task queue.
  • the first number determination module 802 determines the number of times each data block is to be accessed by the application task in the data blocks in the disk to be accessed by the application task of the task queue. A total of 100 data blocks were obtained, of which 20 data blocks were accessed 300 times, 30 data blocks were accessed 200 times, and 50 data blocks were accessed 100 times.
  • the hotspot data determining module 803 determines a data block whose number of times to be accessed by the application task is not less than 150 as a hotspot data block, that is, 20 data blocks to be accessed 300 times and 30 data blocks that are accessed 200 times are determined. It is a hot data block.
  • the move-in instruction sending module 804 sends a move-in instruction to the local node of the hot-spot data block. After receiving the move-in instruction, the local node of the hot-spot data block moves the hot-spot data block from the disk into the memory, so that the hot-spot data block can be accessed in the memory.
  • the first sequence scheduling module 805 executes the application tasks corresponding to the 20 data blocks that have been accessed 300 times in order of the number of times the application task is to be accessed, and executes 20 data blocks that have been accessed 300 times. The corresponding application task.
  • the second number determining module 806 After the second number determining module 806 performs the application task corresponding to the hotspot data block, it determines that the data block is located in the memory, and each data block is to be accessed by the application task in the task queue. frequency. The number of times that 30 data blocks are accessed by the application task in the task queue is 100 times, and the number of 30 data blocks accessed by the application task in the task queue is 160 times.
  • the non-hotspot data determining module 807 determines the in-memory data block to be accessed by the application task for not more than 150 as a non-hotspot data block, that is, 30 data to be accessed 100 times by the application task in the task queue.
  • the block is determined to be a non-hotspot data block.
  • the move-out instruction sending module 808 sends a move-out instruction to the local node of the non-spot data block, and the local node of the non-hotspot data block receives the non-hotspot data block after receiving the move-out instruction. Moved out of memory to disk.
  • the resource scheduling apparatus in the embodiment of the present invention is described above from the perspective of a unitized functional entity.
  • the resource scheduling apparatus in the embodiment of the present invention is described from the perspective of hardware processing.
  • FIG. 9 in the embodiment of the present invention, Another embodiment of the resource scheduling device 900 includes:
  • the input device 901, the output device 902, the processor 903, and the memory 904 (wherein the number of processors 903 in the resource scheduling device 900 may be one or more, and one processor 903 is taken as an example in FIG. 9).
  • the input device 901, the output device 902, the processor 903, and the memory 904 may be connected by a bus or other means, wherein the bus connection is taken as an example in FIG.
  • the processor 903 is configured to perform the following steps by calling an operation instruction stored in the memory 904:
  • the move-in instruction is used to indicate that the hotspot data block is moved into the memory, so that the hotspot data block can be accessed in the memory.
  • the processor 903 also performs the following steps:
  • the data block to be accessed by the application task is not less than N, and is determined to be a hot data block, and the N is a preset value.
  • the processor 903 also performs the following steps:
  • Dispatching an application task corresponding to the hotspot data block to the hotspot data block if the local node of the hotspot data block currently has a free slot slot, after sending the move-in instruction to the local node of the hotspot data block On the local node.
  • the processor 903 also performs the following steps:
  • the application task corresponding to each of the hot-spot data blocks is sequentially executed according to the order of the number of times to be accessed by the application task.
  • the processor 903 also performs the following steps:
  • the method further includes:
  • Each of the application tasks is executed in sequence in descending order of the number of hotspot data blocks to be accessed.
  • the processor 903 also performs the following steps:
  • the P is a preset value, or the number of times to be accessed by the application task is not greater than Q
  • the in-memory data block is determined to be a non-hotspot data block, and the Q is a preset value;
  • the move-out instruction is used to indicate that the non-hotspot data block is removed from the memory.
  • the processor 903 also performs the following steps:
  • the disclosed systems and methods can be implemented in other ways.
  • the system embodiment described above is merely illustrative.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, module or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. in.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明实施例提供了一种资源调度方法,用于提升数据IO效率。本发明实施例提供的资源调度方法包括:确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。本发明实施例还提出了相关的资源调度装置。

Description

一种资源调度方法以及相关装置 技术领域
本发明涉及数据处理领域,尤其涉及一种资源调度方法以及相关装置。
背景技术
近年来,社会经济不断进步,科学技术飞速发展,大、中、小型企业的数据规模也随之快速膨胀。如何提高大数据的存储效率与访问速度,具有重要的实际应用价值。分布式文件系统是一种适合进行大数据存储的文件管理系统,该系统中的物理存储资源不一定连接在本地节点上,而是通过计算机网络与多个节点相连。分布式文件系统中,大的数据块被分为多个小的数据块存储在多个节点上,使得分布式文件系统有着较高的容错性与吞吐量。其中,Hadoop分布式文件系统(HDFS,Hadoop Distributed File System)是一种常用的高度容错性的文件系统,能够被部署在廉价的机器上,非常适合在大规模数据集上的应用。在HDFS中,数据被集合成数据块(block)存储于数据节点(DN,DataNode)的磁盘中,应用任务(task)可以从磁盘中的数据块中读取数据,或将数据写入磁盘中的数据块中。但是在大规模数据的分析任务中,磁盘中的数据需要被应用任务反复读写,导致数据输入输出(IO)花费了大量的时间,任务运行时间过长。
由于内存的IO速度要远远快于磁盘的IO速度,因此现阶段的HDFS在进行资源调度时,会统计每个数据块在预置时间段内的历史被访问次数,然后将历史被访问次数较多的数据块确定为热点数据块并移入DN的内存中,使得应用任务可以直接通过DN的内存来访问热点数据块,提升了数据IO效率。
但是,数据块的历史被访问次数并不能准确的反应该数据块的热点程度,即使该数据块的历史被访问次数较多,该数据块在移入内存中后被访问的次数也可能很少。在这种情况下,若将该数据块移入内存中,则不仅不能显著的提升数据IO效率,还会造成不必要的内存资源浪费。
发明内容
本发明实施例提供了一种资源调度方法,可以提升系统的数据IO效率。
本发明实施例第一方面提供了一种资源调度方法,包括:
确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
结合本发明实施例的第一方面,本发明实施例的第一方面的第一种实现方式中,所述根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块包括:
将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
或,
将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块,所述N为预置数值。
结合本发明实施例的第一方面或第一方面的第一种实现方式,本发明实施例的第一方面的第二种实现方式中,所述向所述热点数据块的本地节点发送移入指令之后还包括:
若所述热点数据块的本地节点当前具有空闲的槽位slot,则将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
结合本发明实施例的第一方面或第一方面的第一种实现方式,本发明实施例的第一方面的第三种实现方式中,所述向所述热点数据块的本地节点发送移入指令之后还包括:
按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
结合本发明实施例的第一方面或第一方面的第一种实现方式,本发明实施例的第一方面的第四种实现方式还包括:
确定每个所述应用任务所要访问的热点数据块的个数;
所述向所述热点数据块的本地节点发送移入指令之后还包括:
按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
结合本发明实施例的第一方面、或第一方面的第一种至第四种实现方式中的任一项,本发明实施例的第一方面的第五种实现方式还包括:
确定每个内存中的数据块待被所述应用任务访问的次数;
将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
向所述非热点数据块的本地节点发送移出指令,所述移出指令用于指示将所述非热点数据块从内存中移出。
结合本发明实施例的第一方面、或第一方面的第一种至第五种实现方式中的任一项,本发明实施例的第一方面的第六种实现方式中,所述确定当前的任务队列包括:
接收预置时间段内,客户端下发的待执行的工作的执行指令;
将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
本发明实施例第二方面提供了一种资源调度装置,包括:
任务队列确定模块,用于确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
第一次数确定模块,用于确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
热点数据确定模块,用于根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
移入指令发送模块,用于向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
结合本发明实施例的第二方面,本发明实施例的第二方面的第一种实现方 式中,所述热点数据确定模块具体用于:
将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
或,
将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块,所述N为预置数值。
结合本发明实施例的第二方面或第二方面的第一种实现方式,本发明实施例的第二方面的第二种实现方式中,所述装置还包括:
任务节点调度模块,用于在所述移入指令发送模块向所述热点数据块的本地节点发送移入指令之后,当所述热点数据块的本地节点当前具有空闲的槽位slot时,将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
结合本发明实施例的第二方面或第二方面的第一种实现方式,本发明实施例的第二方面的第三种实现方式中,所述装置还包括:
第一顺序调度模块,用于在所述移入指令发送模块向所述热点数据块的本地节点发送移入指令之后,按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
结合本发明实施例的第二方面或第二方面的第一种实现方式,本发明实施例的第二方面的第四种实现方式中,所述装置还包括:
访问个数确定模块,用于确定每个所述应用任务所要访问的热点数据块的个数;
所述装置还包括:
第二顺序调度模块,用于在所述移入指令发送模块向所述热点数据块的本地节点发送移入指令之后,按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
结合本发明实施例的第二方面、或第二方面的第一种至第四种实现方式中的任一项,本发明实施例的第二方面的第五种实现方式中,所述装置还包括:
第二次数确定模块,用于确定每个内存中的数据块待被所述应用任务访问的次数;
非热点数据确定模块,用于将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
移出指令发送模块,用于向所述非热点数据块的本地节点发送移出指令,所述移出指令用于指示将所述非热点数据块从内存中移出。
结合本发明实施例的第二方面、或第二方面的第一种至第五种实现方式中的任一项,本发明实施例的第二方面的第六种实现方式中,所述任务队列确定模块包括:
指令接收单元,用于接收预置时间段内,客户端下发的待执行的工作的执行指令;
任务划分单元,用于将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
本发明实施例的第三方面提供了一种资源调度装置,包括输入装置、输出装置、处理器和存储器,其特征在于,调用存储器存储的操作指令,所述处理器用于执行如下步骤:
确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
结合本发明实施例的第三方面,本发明实施例的第三方面的第一种实现方式中,处理器还执行如下步骤:
将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
或,
将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块,所述N为预置数值。
结合本发明实施例的第三方面或第三方面的第一种实现方式,本发明实施例的第三方面的第二种实现方式中,处理器还执行如下步骤:
在向所述热点数据块的本地节点发送移入指令之后,若所述热点数据块的本地节点当前具有空闲的槽位slot,则将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
结合本发明实施例的第三方面或第三方面的第一种实现方式,本发明实施例的第三方面的第三种实现方式中,处理器还执行如下步骤:
在向所述热点数据块的本地节点发送移入指令之后,按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
结合本发明实施例的第三方面或第三方面的第一种实现方式,本发明实施例的第三方面的第四种实现方式中,处理器还执行如下步骤:
确定每个所述应用任务所要访问的热点数据块的个数;
所述向所述热点数据块的本地节点发送移入指令之后还包括:
按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
结合本发明实施例的第三方面、或第三方面的第一种至第四种实现方式中的任一项,本发明实施例的第三方面的第五种实现方式中,处理器还执行如下步骤:
确定每个内存中的数据块待被所述应用任务访问的次数;
将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
向所述非热点数据块的本地节点发送移出指令,所述移出指令用于指示将所述非热点数据块从内存中移出。
结合本发明实施例的第三方面、或第三方面的第一种至第五种实现方式中的任一项,本发明实施例的第三方面的第六种实现方式中,处理器还执行如下步骤:
接收预置时间段内,客户端下发的待执行的工作的执行指令;
将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
本发明实施例提供了一种资源调度方法,包括:确定当前的任务队列,该任务队列中包括多个待执行的应用任务;确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;根据每个数据块待被应用任务访问的次数,确定热点数据块;向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。本发明实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本发明实施例提供的资源调度方法能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。
附图说明
图1为本发明实施例中资源调度方法一个实施例流程图;
图2为本发明实施例中资源调度方法另一个实施例流程图;
图3为本发明实施例中资源调度方法另一个实施例流程图;
图4为本发明实施例中资源调度方法另一个实施例流程图;
图5为本发明实施例中资源调度装置一个实施例流程图;
图6为本发明实施例中资源调度装置另一个实施例流程图;
图7为本发明实施例中资源调度装置另一个实施例流程图;
图8为本发明实施例中资源调度装置另一个实施例流程图;
图9为本发明实施例中资源调度装置另一个实施例流程图。
具体实施方式
本发明实施例提供了一种资源调度方法,可以提升数据IO效率。本发明还提出了相关的资源调度装置,以下将分别进行说明。
本发明实施例提供的资源调度方法的基本流程请参阅图1,主要包括:
101、确定当前的任务队列;
资源调度装置确定当前的任务队列,该任务队列中包括多个待执行的应用任务。
资源调度装置确定当前的任务队列的方法有很多,具体将在后面的实施例中详述,此处不做限定。
102、确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
任务队列中的应用任务需要访问位于磁盘中的数据块,本实施例中,资源调度装置确定任务队列的应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数。
103、根据每个数据块待被应用任务访问的次数,确定热点数据块;
资源调度装置根据每个数据块待被应用任务访问的次数,确定热点数据块。热点数据块为磁盘中的数据块中,待被访问次数较多的数据块,其确定方法有很多,具体将在后面的实施例中详述,此处不做限定。
104、向热点数据块的本地节点发送移入指令。
资源调度装置确定了热点数据块后,向热点数据块的本地节点发送移入指令,该移入指令用于指示热点数据块的本地节点将热点数据块从磁盘中移入内存,使得热点数据块可以在内存中被访问。其中,热点数据块的本地节点指的是热点数据块所在的节点。其中,热点数据块的本地节点优先将热点数据块从磁盘中移入本地的内存中。
本实施例提供了一种资源调度方法,包括:确定当前的任务队列;确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;根据每个数据块待被应用任务访问的次数,确定热点数据块;向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度方法能够显著的提升数 据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。
图1所示的实施例给出了本发明实施例提供的资源调度方法的基本流程,其中,资源调度装置通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,其确定方法有很多。优选的,作为本发明的又一个实施例,步骤103中资源调度装置根据每个数据块待被应用任务访问的次数确定热点数据块的方法具体可以为:将待被该应用任务访问的次数最高的前M个数据块确定为热点数据块;或将待被该应用任务访问的次数不小于N的数据块确定为热点数据块。其中M、N均为预置数值。步骤103中资源调度装置也可以通过其它方法来根据每个数据块待被应用任务访问的次数确定热点数据块,此处不做限定。
优选的,作为本发明的又一个实施例,步骤104中资源调度装置向热点数据块的本地节点发送移入指令后,还可以判断热点数据块的本地节点当前是否有空闲的槽位(slot)。若确定热点数据块的本地节点当前具有空闲的slot,则将热点数据块对应的应用任务调度到热点数据块的本地节点上,使得该应用任务无需跨节点访问热点数据块,提升了系统的数据IO效率。
图1所示的实施例详细解释了本发明提供的资源调度方法如何确定并调度热点数据块,下面将提供另一种资源调度方法,可以在图1所示的实施例的基础上进一步的实现对热点数据块对应的工作任务进行调度,请参阅图2。其基本流程包括:
201、确定当前的任务队列;
202、确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
203、根据每个数据块待被应用任务访问的次数,确定热点数据块;
204、向热点数据块的本地节点发送移入指令。
步骤201至204与步骤101至104基本相同,此处不做赘述。
205、按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
资源调度装置向热点数据块的本地节点发送移入指令后,热点数据块的本地节点会将热点数据块移入内存中。此时,任务队列中的应用任务可以直接从 内存中访问热点数据块。本实施例中,资源调度装置还用于对工作队列中的应用任务的执行顺序进行调度,具体方法为:按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
本实施例提供了一种资源调度方法,包括:确定当前的任务队列;确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;根据每个数据块待被应用任务访问的次数,确定热点数据块;向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问;按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度方法能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。同时通过按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务,使得热点程度较高的数据块对应的任务可以先被执行,优化了工作队列中的应用任务的执行顺序,提升了系统执行应用任务的效率。
资源调度装置还可以通过其他方法对工作队列中的应用任务的执行顺序进行调度。优选的,作为本发明的又一实施例,步骤205之前还可以包括步骤:资源调度装置确定每个应用任务所要访问的热点数据块的个数。步骤205可以被替换为:按照要访问的热点数据块的个数从多到少的顺序,依次执行每个应用任务。这样就可以使得访问热点数据块的次数较多的应用任务先被执行,同样可以优化工作队列中的应用任务的执行顺序,提升系统执行应用任务的效率。
图2所示的实施例所提供的方法给出了在将热点数据块移入内存中后,优化工作队列中的应用任务的执行顺序的方法。但在实际应用中,内存所能容纳的数据块的个数是有限的。在本发明提供的资源调度方法的应用过程中,热点数据块会被源源不断的移入内存中,当移入内存中的数据块的个数达到内存所能容纳的上限时,内存就没有能力接纳新的热点数据块。为了使本发明提供的 资源调度方法能够持续运行,需要保证内存具有足够的空间来容纳新的数据块。为此,本发明提供了一种新的实施例,用于保证节点的内存具有足够的空间来容纳新的数据块,请参阅图3,本发明实施例提供的又一种资源调度方法的基本流程包括:
301、确定当前的任务队列;
302、确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
303、根据每个数据块待被应用任务访问的次数,确定热点数据块;
304、向热点数据块的本地节点发送移入指令。
305、按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
步骤301至305与步骤201至205基本相同,此处不做赘述。
内存所能容纳的数据块的个数是有限的。在本发明提供的资源调度方法的应用过程中,热点数据块会被源源不断的移入内存中,当移入内存中的数据块的个数达到内存所能容纳的上限时,内存就没有能力接纳新的热点数据块。为了使本发明提供的资源调度方法能够持续运行,需要保证内存具有足够的空间来容纳新的数据块。因此本实施例中,资源调度装置会根据任务队列判断位于内存中的数据块中,哪些数据块待被访问的次数较少,并将待被访问的次数较少的数据块移出内存。具体的方法可以参阅步骤306至308。
306、确定每个内存中的数据块待被应用任务访问的次数;
资源调度装置确定位于内存中的数据块中,每个数据块待被任务队列中的应用任务访问的次数。
307、根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;
资源调度装置根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块,该非热点数据块用于表示位于内存中的数据块中,待被应用任务访问的次数较少的数据块。非热点数据块的确认方法有很多,例如:将待被应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,其中P为预置数值,或,将待被应用任务访问的次数不大于Q的内存中的数据块确 定为非热点数据块,其中Q为预置数值。资源调度装置也可以通过其他方法来确定非热点数据块,此处不做限定。
308、向非热点数据块的本地节点发送移出指令。
资源调度装置确定了非热点数据块后,向非点数据块的本地节点发送移出指令,该移出指令用于指示非热点数据块的本地节点将非热点数据块从内存中移出到磁盘中。
本实施例提供了一种资源调度方法,包括:确定当前的任务队列;确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;根据每个数据块待被应用任务访问的次数,确定热点数据块;向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问;按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务;确定每个内存中的数据块待被应用任务访问的次数;根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;向非热点数据块的本地节点发送移出指令,指示非热点数据块的本地节点将非热点数据块从内存中移出到磁盘中。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度方法能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。同时通过按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务,使得热点程度较高的数据块对应的任务可以先被执行,优化了工作队列中的应用任务的执行顺序,提升了系统执行应用任务的效率。实施例中资源调度装置还会确定内存中的非热点数据块,并指示非热点数据块的本地节点将非热点数据块从内存中移出,使得内存中保存的数据块均为热点程度较高的数据库,实现了内存中热点数据块的动态优化。
图1至图3所示的实施例提供的资源调度方法均根据工作队列来确定热点数据,下面将提供一种更为细化的资源调度方法,详细解释如何确定工作队列,其基本流程请参阅图4,包括:
401、接收预置时间段内,客户端下发的待执行的工作的执行指令;
资源调度装置接收预置时间段内,客户端下发的待执行的工作(Job)的执行指令,该执行指令用于指示资源调度装置执行待执行的工作。
其中,预置时间段可以为人为设置的时间段,也可以为资源调度装置默认的时间段,也可以为其他时间段,此处不做限定。
402、将待执行的工作划分为多个待执行的应用任务,并将该多个待执行的应用任务的集合确定为当前的任务队列;
可以理解的,每个待执行的工作均可以被划分为一个或多个待执行的应用任务,每个待执行的应用任务需要访问一个数据块。例如,某个待执行的工作需要访问128M大小的数据文件,而分布式文件系统中每个数据块的大小为32M,于是资源调度装置将该待执行的工作划分为4个待执行的应用任务,每个待执行的应用任务用于访问一个32M的数据块。
本实施例中待执行的工作可以只包括一个工作,但是较为优选的,待执行的工作可以包括多个工作,资源调度装置将该多个工作划分为多个待执行的应用任务,并将该多个待执行的应用任务的集合确定为当前的任务队列。
403、确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
404、根据每个数据块待被应用任务访问的次数,确定热点数据块;
405、向热点数据块的本地节点发送移入指令;
406、按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
407、确定每个内存中的数据块待被应用任务访问的次数;
408、根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;
409、向非热点数据块的本地节点发送移出指令。
步骤403至409与步骤302至308基本相同,此处不做赘述。
本实施例提供了一种资源调度方法,包括:接收预置时间段内,客户端下发的待执行的工作的执行指令;将待执行的工作划分为多个待执行的应用任务,并将该多个待执行的应用任务的集合确定为当前的任务队列;确定应用任 务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;根据每个数据块待被应用任务访问的次数,确定热点数据块;向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问;按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务;确定每个内存中的数据块待被应用任务访问的次数;根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;向非热点数据块的本地节点发送移出指令,指示非热点数据块的本地节点将非热点数据块从内存中移出到磁盘中。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度方法能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。同时,本实施例中资源调度装置还对工作队列中的应用任务的执行顺序进行调度,使得待被应用任务访问的次数较多的热点数据块对应的任务被优先执行。本实施例中资源调度装置还会确定内存中的非热点数据块,并指示非热点数据块的本地节点将非热点数据块从内存中移出,使得内存中保存的数据块均为热点程度较高的数据库,实现了内存中热点数据块的动态优化。
为了便于理解上述实施例,下面将以上述实施例的一个具体的应用场景为例进行描述。
在分布式文件系统中,资源调度装置接收预置时间段内,客户端下发的待执行的工作的执行指令,该执行指令用于指示资源调度装置执行待执行的工作,该待执行的工作需要访问128M大小的数据文件。
分布式文件系统中每个数据块的大小为32M,于是资源调度装置将该待执行的工作划分为4个待执行的应用任务,每个待执行的应用任务用于访问一个32M的数据块。资源调度装置将该4个待执行的应用任务集合确定为当前的任务队列。
资源调度装置确定任务队列的应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数。得到共有100个数据块,其中20个数据 块被访问了300次,30个数据块被访问了200次,50个数据块被访问了100次。
资源调度装置将待被应用任务访问的次数不小于150的数据块确定为热点数据块,即,将被访问了300次的20个数据块与被访问了200次的30个数据块确定为热点数据块。
资源调度装置向热点数据块的本地节点发送移入指令,热点数据块的本地节点接收到移入指令后,将热点数据块从磁盘中移入内存,使得热点数据块可以在内存中被访问。
资源调度装置按照待被应用任务访问的次数从多到少的顺序,先执行被访问了300次的20个数据块对应的应用任务,再执行被访问了300次的20个数据块对应的应用任务。
当前内存中已有60个数据块,资源调度装置执行了热点数据块对应的应用任务后,确定位于内存中的数据块中,每个数据块待被任务队列中的应用任务访问的次数。得到有30个数据块被任务队列中的应用任务访问的次数为100次,有30个数据块被任务队列中的应用任务访问的次数为160次。
资源调度装置将待被应用任务访问的次数不大于150的内存中的数据块确定为非热点数据块,即,将被任务队列中的应用任务访问的次数为100次的30个数据块确定为非热点数据块。
资源调度装置确定了非热点数据块后,向非点数据块的本地节点发送移出指令,非热点数据块的本地节点接收到该移出指令后,将非热点数据块从内存中移出到磁盘中。
本发明实施例还提供了一种资源调度装置,用于实现图1至图4所示的实施例所提供的方法,其基本结构请参阅图5,主要包括:
任务队列确定模块501,用于确定当前的任务队列;
任务队列确定模块501确定当前的任务队列,该任务队列中包括多个待执行的应用任务。
第一次数确定模块502,用于确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
任务队列中的应用任务需要访问位于磁盘中的数据块,本实施例中,第一 次数确定模块502确定任务队列的应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数。
热点数据确定模块503,用于根据每个数据块待被应用任务访问的次数,确定热点数据块;
热点数据确定模块503根据每个数据块待被应用任务访问的次数,确定热点数据块。热点数据块为磁盘中的数据块中,待被访问次数较多的数据块,其确定方法有很多,具体将在后面的实施例中详述,此处不做限定。
移入指令发送模块504,用于向热点数据块的本地节点发送移入指令。
热点数据确定模块503确定了热点数据块后,移入指令发送模块504向热点数据块的本地节点发送移入指令,该移入指令用于指示热点数据块的本地节点将热点数据块从磁盘中移入内存,使得热点数据块可以在内存中被访问。其中,热点数据块的本地节点指的是热点数据块所在的节点。其中,热点数据块的本地节点优先将热点数据块从磁盘中移入本地的内存中。
本实施例提供了一种资源调度方法,包括:任务队列确定模块501确定当前的任务队列;第一次数确定模块502确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;热点数据确定模块503根据每个数据块待被应用任务访问的次数,确定热点数据块;移入指令发送模块504向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度装置能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。
图1所示的实施例给出了本发明实施例提供的资源调度装置基本结构,其中,热点数据确定模块503通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,其确定方法有很多。优选的,作为本发明的又一个实施例,热点数据确定模块503具体可以用于:将待被该应用任务访问的次数最高的前M个数据块确定为热点数据块;或将待被该应用任务访问的次数不小于N的 数据块确定为热点数据块。其中M、N均为预置数值。热点数据确定模块503也可以通过其它方法来根据每个数据块待被应用任务访问的次数确定热点数据块,此处不做限定。
优选的,作为本发明的又一个实施例,资源调度装置还可以包括任务节点调度模块,用于在移入指令发送模块504向热点数据块的本地节点发送移入指令后,当热点数据块的本地节点当前具有空闲的槽位(slot)时,则将热点数据块对应的应用任务调度到热点数据块的本地节点上,使得该应用任务无需跨节点访问热点数据块,提升了系统的数据IO效率。
图5所示的实施例详细解释了本发明提供的资源调度装置如何确定并调度热点数据块,下面将提供另一种资源调度装置,可以在图5所示的实施例的基础上进一步的实现对热点数据块对应的工作任务进行调度,请参阅图6。其基本结构包括:
任务队列确定模块601,用于确定当前的任务队列;
第一次数确定模块602,用于确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
热点数据确定模块603,用于根据每个数据块待被应用任务访问的次数,确定热点数据块;
移入指令发送模块604,用于向热点数据块的本地节点发送移入指令;
模块601至604与模块501至504基本相同,此处不做赘述。
第一顺序调度模块605,用于在移入指令发送模块604向热点数据块的本地节点发送移入指令之后,按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
移入指令发送模块604向热点数据块的本地节点发送移入指令后,热点数据块的本地节点会将热点数据块移入内存中。此时,任务队列中的应用任务可以直接从内存中访问热点数据块。本实施例中,资源调度装置还用于对工作队列中的应用任务的执行顺序进行调度,具体方法为:第一顺序调度模块605按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
本实施例提供了一种资源调度装置,包括:任务队列确定模块601确定当 前的任务队列;第一次数确定模块602确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;热点数据确定模块603根据每个数据块待被应用任务访问的次数,确定热点数据块;移入指令发送模块604向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问;第一顺序调度模块605按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度装置能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。同时通过按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务,使得热点程度较高的数据块对应的任务可以先被执行,优化了工作队列中的应用任务的执行顺序,提升了系统执行应用任务的效率。
资源调度装置还可以通过其他方法对工作队列中的应用任务的执行顺序进行调度。优选的,作为本发明的又一实施例,资源调度装置还可以包括访问个数确定模块,用于确定每个应用任务所要访问的热点数据块的个数。第一顺序调度模块605可以被替换为第二顺序调度模块,用于在移入指令发送模块604向热点数据块的本地节点发送移入指令之后,按照要访问的热点数据块的个数从多到少的顺序,依次执行每个应用任务。这样就可以使得访问热点数据块的次数较多的应用任务先被执行,同样可以优化工作队列中的应用任务的执行顺序,提升系统执行应用任务的效率。
图6所示的实施例所提供的装置能够在将热点数据块移入内存中后,优化工作队列中的应用任务的执行顺序。但在实际应用中,内存所能容纳的数据块的个数是有限的。在本发明提供的资源调度装置的应用过程中,热点数据块会被源源不断的移入内存中,当移入内存中的数据块的个数达到内存所能容纳的上限时,内存就没有能力接纳新的热点数据块。为了使本发明提供的资源调度装置能够持续运行,需要保证内存具有足够的空间来容纳新的数据块。为此,本发明提供了一种新的实施例,用于保证节点的内存具有足够的空间来容纳新 的数据块,请参阅图7,本发明实施例提供的又一种资源调度装置的基本结构包括:
任务队列确定模块701,用于确定当前的任务队列;
第一次数确定模块702,用于确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
热点数据确定模块703,用于根据每个数据块待被应用任务访问的次数,确定热点数据块;
移入指令发送模块704,用于向热点数据块的本地节点发送移入指令;
第一顺序调度模块705,用于在移入指令发送模块704向热点数据块的本地节点发送移入指令之后,按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
模块701至705与模块601至605基本相同,此处不做赘述。
第二次数确定模块706,用于确定每个内存中的数据块待被应用任务访问的次数;
第二次数确定模块706确定位于内存中的数据块中,每个数据块待被任务队列中的应用任务访问的次数。
非热点数据确定模块707,用于根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;
非热点数据确定模块707根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块,该非热点数据块用于表示位于内存中的数据块中,待被应用任务访问的次数较少的数据块。非热点数据块的确认方法有很多,例如:将待被应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,其中P为预置数值,或,将待被应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,其中Q为预置数值。非热点数据确定模块707也可以通过其他方法来确定非热点数据块,此处不做限定。
移出指令发送模块708,用于向非热点数据块的本地节点发送移出指令。
非热点数据确定模块707确定了非热点数据块后,移出指令发送模块708向非点数据块的本地节点发送移出指令,该移出指令用于指示非热点数据块的本地节点将非热点数据块从内存中移出到磁盘中。
本实施例提供了一种资源调度装置,包括:任务队列确定模块701确定当前的任务队列;第一次数确定模块702确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;热点数据确定模块703根据每个数据块待被应用任务访问的次数,确定热点数据块;移入指令发送模块704向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问;第一顺序调度模块705按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务;第二次数确定模块706确定每个内存中的数据块待被应用任务访问的次数;非热点数据确定模块707根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;移出指令发送模块708向非热点数据块的本地节点发送移出指令,指示非热点数据块的本地节点将非热点数据块从内存中移出到磁盘中。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度装置能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。同时通过按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务,使得热点程度较高的数据块对应的任务可以先被执行,优化了工作队列中的应用任务的执行顺序,提升了系统执行应用任务的效率。非热点数据确定模块707还会确定内存中的非热点数据块,移出指令发送模块708指示非热点数据块的本地节点将非热点数据块从内存中移出,使得内存中保存的数据块均为热点程度较高的数据库,实现了内存中热点数据块的动态优化。
图5至图7所示的实施例提供的资源调度装置均根据工作队列来确定热点数据,下面将提供一种更为细化的资源调度装置,详细解释如何确定工作队列,其基本结构请参阅图8,包括:
任务队列确定模块801,用于确定当前的任务队列。本实施例中,任务队列确定模块801具体包括:
指令接收单元8011,用于接收预置时间段内,客户端下发的待执行的工作的执行指令;
指令接收单元8011接收预置时间段内,客户端下发的待执行的工作(Job)的执行指令,该执行指令用于指示资源调度装置执行待执行的工作。
其中,预置时间段可以为人为设置的时间段,也可以为资源调度装置默认的时间段,也可以为其他时间段,此处不做限定。
任务划分单元8012,用于将待执行的工作划分为多个待执行的应用任务,并将该多个待执行的应用任务的集合确定为当前的任务队列;
可以理解的,每个待执行的工作均可以被划分为一个或多个待执行的应用任务,每个待执行的应用任务需要访问一个数据块。例如,某个待执行的工作需要访问128M大小的数据文件,而分布式文件系统中每个数据块的大小为32M,于是资源调度装置将该待执行的工作划分为4个待执行的应用任务,每个待执行的应用任务用于访问一个32M的数据块。
本实施例中待执行的工作可以只包括一个工作,但是较为优选的,待执行的工作可以包括多个工作,任务划分单元8012将该多个工作划分为多个待执行的应用任务,并将该多个待执行的应用任务的集合确定为当前的任务队列。
第一次数确定模块802,用于确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;
热点数据确定模块803,用于根据每个数据块待被应用任务访问的次数,确定热点数据块;
移入指令发送模块804,用于向热点数据块的本地节点发送移入指令;
第一顺序调度模块805,用于在移入指令发送模块804向热点数据块的本地节点发送移入指令之后,按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务。
第二次数确定模块806,用于确定每个内存中的数据块待被应用任务访问的次数;
非热点数据确定模块807,用于根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;
移出指令发送模块808,用于向非热点数据块的本地节点发送移出指令。
模块802至808与模块702至708基本相同,此处不做赘述。
本实施例提供了一种资源调度装置,包括:指令接收单元8011接收预置 时间段内,客户端下发的待执行的工作的执行指令;任务划分单元8012将待执行的工作划分为多个待执行的应用任务,并将该多个待执行的应用任务的集合确定为当前的任务队列;第一次数确定模块802确定应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数;热点数据确定模块803根据每个数据块待被应用任务访问的次数,确定热点数据块;移入指令发送模块804向热点数据块的本地节点发送移入指令,该移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问;第一顺序调度模块805按照待被应用任务访问的次数从多到少的顺序,依次执行每个热点数据块对应的应用任务;第二次数确定模块806确定每个内存中的数据块待被应用任务访问的次数;非热点数据确定模块807根据每个内存中的数据块待被应用任务访问的次数,确定非热点数据块;移出指令发送模块808向非热点数据块的本地节点发送移出指令,指示非热点数据块的本地节点将非热点数据块从内存中移出到磁盘中。本实施例通过当前的任务队列中的待执行的应用任务来确定数据块的热点程度,保证了被确定的热点数据块在移入内存中后,被待执行的应用任务访问的次数较多。与现有技术中根据历史被访问次数确定热点数据块相比,本实施例提供的资源调度装置能够显著的提升数据IO效率,进而缩短应用任务的运行时间,不会造成不必要的内存资源浪费。同时,本实施例中第一顺序调度模块805还对工作队列中的应用任务的执行顺序进行调度,使得待被应用任务访问的次数较多的热点数据块对应的任务被优先执行。本实施例中非热点数据确定模块807还会确定内存中的非热点数据块,移出指令发送模块808指示非热点数据块的本地节点将非热点数据块从内存中移出,使得内存中保存的数据块均为热点程度较高的数据库,实现了内存中热点数据块的动态优化。
为了便于理解上述实施例,下面将以上述实施例的一个具体的应用场景为例进行描述。
在分布式文件系统中,指令接收单元8011接收预置时间段内,客户端下发的待执行的工作的执行指令,该执行指令用于指示资源调度装置执行待执行的工作,该待执行的工作需要访问128M大小的数据文件。
分布式文件系统中每个数据块的大小为32M,于是任务划分单元8012将 该待执行的工作划分为4个待执行的应用任务,每个待执行的应用任务用于访问一个32M的数据块。任务划分单元8012将该4个待执行的应用任务集合确定为当前的任务队列。
第一次数确定模块802确定任务队列的应用任务所要访问的磁盘中的数据块中,每个数据块待被应用任务访问的次数。得到共有100个数据块,其中20个数据块被访问了300次,30个数据块被访问了200次,50个数据块被访问了100次。
热点数据确定模块803将待被应用任务访问的次数不小于150的数据块确定为热点数据块,即,将被访问了300次的20个数据块与被访问了200次的30个数据块确定为热点数据块。
移入指令发送模块804向热点数据块的本地节点发送移入指令,热点数据块的本地节点接收到移入指令后,将热点数据块从磁盘中移入内存,使得热点数据块可以在内存中被访问。
第一顺序调度模块805按照待被应用任务访问的次数从多到少的顺序,先执行被访问了300次的20个数据块对应的应用任务,再执行被访问了300次的20个数据块对应的应用任务。
当前内存中已有60个数据块,第二次数确定模块806执行了热点数据块对应的应用任务后,确定位于内存中的数据块中,每个数据块待被任务队列中的应用任务访问的次数。得到有30个数据块被任务队列中的应用任务访问的次数为100次,有30个数据块被任务队列中的应用任务访问的次数为160次。
非热点数据确定模块807将待被应用任务访问的次数不大于150的内存中的数据块确定为非热点数据块,即,将被任务队列中的应用任务访问的次数为100次的30个数据块确定为非热点数据块。
非热点数据确定模块807确定了非热点数据块后,移出指令发送模块808向非点数据块的本地节点发送移出指令,非热点数据块的本地节点接收到该移出指令后,将非热点数据块从内存中移出到磁盘中。
上面从单元化功能实体的角度对本发明实施例中的资源调度装置进行了描述,下面从硬件处理的角度对本发明实施例中的资源调度装置进行描述,请参阅图9,本发明实施例中的资源调度装置900另一实施例包括:
输入装置901、输出装置902、处理器903和存储器904(其中资源调度装置900中的处理器903的数量可以一个或多个,图9中以一个处理器903为例)。在本发明的一些实施例中,输入装置901、输出装置902、处理器903和存储器904可通过总线或其它方式连接,其中,图9中以通过总线连接为例。
其中,通过调用存储器904存储的操作指令,处理器903用于执行如下步骤:
确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
本发明的一些实施例中,处理器903还执行如下步骤:
将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
或,
将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块,所述N为预置数值。
本发明的一些实施例中,处理器903还执行如下步骤:
在向所述热点数据块的本地节点发送移入指令之后,若所述热点数据块的本地节点当前具有空闲的槽位slot,则将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
本发明的一些实施例中,处理器903还执行如下步骤:
在向所述热点数据块的本地节点发送移入指令之后,按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
本发明的一些实施例中,处理器903还执行如下步骤:
确定每个所述应用任务所要访问的热点数据块的个数;
所述向所述热点数据块的本地节点发送移入指令之后还包括:
按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
本发明的一些实施例中,处理器903还执行如下步骤:
确定每个内存中的数据块待被所述应用任务访问的次数;
将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
向所述非热点数据块的本地节点发送移出指令,所述移出指令用于指示将所述非热点数据块从内存中移出。
本发明的一些实施例中,处理器903还执行如下步骤:
接收预置时间段内,客户端下发的待执行的工作的执行指令;
将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,模块或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元 中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (21)

  1. 一种资源调度方法,其特征在于,包括:
    确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
    确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
    根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
    向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
  2. 根据权利要求1所述的资源调度方法,其特征在于,所述根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块包括:
    将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
    或,
    将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块,所述N为预置数值。
  3. 根据权利要求1或2所述的资源调度方法,其特征在于,所述向所述热点数据块的本地节点发送移入指令之后还包括:
    若所述热点数据块的本地节点当前具有空闲的槽位slot,则将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
  4. 根据权利要求1或2所述的资源调度方法,其特征在于,所述向所述热点数据块的本地节点发送移入指令之后还包括:
    按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
  5. 根据权利要求1或2所述的资源调度方法,其特征在于,所述方法还包括:
    确定每个所述应用任务所要访问的热点数据块的个数;
    所述向所述热点数据块的本地节点发送移入指令之后还包括:
    按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
  6. 根据权利要求1至5中任一项所述的资源调度方法,其特征在于,所述方法还包括:
    确定每个内存中的数据块待被所述应用任务访问的次数;
    将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
    向所述非热点数据块的本地节点发送移出指令,所述移出指令用于指示将所述非热点数据块从内存中移出。
  7. 根据权利要求1至6中任一项所述的资源调度方法,其特征在于,所述确定当前的任务队列包括:
    接收预置时间段内,客户端下发的待执行的工作的执行指令;
    将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
  8. 一种资源调度装置,其特征在于,包括:
    任务队列确定模块,用于确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
    第一次数确定模块,用于确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
    热点数据确定模块,用于根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
    移入指令发送模块,用于向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
  9. 根据权利要求8所述的资源调度装置,其特征在于,所述热点数据确定模块具体用于:
    将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
    或,
    将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块, 所述N为预置数值。
  10. 根据权利要求8或9所述的资源调度装置,其特征在于,所述装置还包括:
    任务节点调度模块,用于在所述移入指令发送模块向所述热点数据块的本地节点发送移入指令之后,当所述热点数据块的本地节点当前具有空闲的槽位slot时,将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
  11. 根据权利要求8或9所述的资源调度装置,其特征在于,所述装置还包括:
    第一顺序调度模块,用于在所述移入指令发送模块向所述热点数据块的本地节点发送移入指令之后,按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
  12. 根据权利要求8或9所述的资源调度装置,其特征在于,所述装置还包括:
    访问个数确定模块,用于确定每个所述应用任务所要访问的热点数据块的个数;
    所述装置还包括:
    第二顺序调度模块,用于在所述移入指令发送模块向所述热点数据块的本地节点发送移入指令之后,按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
  13. 根据权利要求8至12中任一项所述的资源调度装置,其特征在于,所述装置还包括:
    第二次数确定模块,用于确定每个内存中的数据块待被所述应用任务访问的次数;
    非热点数据确定模块,用于将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
    移出指令发送模块,用于向所述非热点数据块的本地节点发送移出指令, 所述移出指令用于指示将所述非热点数据块从内存中移出。
  14. 根据权利要求7至13中任一项所述的资源调度装置,其特征在于,所述任务队列确定模块包括:
    指令接收单元,用于接收预置时间段内,客户端下发的待执行的工作的执行指令;
    任务划分单元,用于将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
  15. 一种资源调度装置,包括输入装置、输出装置、处理器和存储器,其特征在于,调用存储器存储的操作指令,所述处理器用于执行如下步骤:
    确定当前的任务队列,所述任务队列中包括多个待执行的应用任务;
    确定所述应用任务所要访问的磁盘中的数据块中,每个数据块待被所述应用任务访问的次数;
    根据所述每个数据块待被所述应用任务访问的次数,确定热点数据块;
    向所述热点数据块的本地节点发送移入指令,所述移入指令用于指示将所述热点数据块移入内存,使得所述热点数据块可以在内存中被访问。
  16. 根据权利要求15所述的资源调度装置,其特征在于,所述处理器还执行如下步骤:
    将待被所述应用任务访问的次数最高的前M个数据块,确定为热点数据块,所述M为预置数值;
    或,
    将待被所述应用任务访问的次数不小于N的数据块,确定为热点数据块,所述N为预置数值。
  17. 根据权利要求15或16所述的资源调度装置,其特征在于,所述处理器还执行如下步骤:
    在向所述热点数据块的本地节点发送移入指令之后,若所述热点数据块的本地节点当前具有空闲的槽位slot,则将所述热点数据块对应的应用任务调度到所述热点数据块的本地节点上。
  18. 根据权利要求15或16所述的资源调度装置,其特征在于,所述处理器还执行如下步骤:
    在向所述热点数据块的本地节点发送移入指令之后,按照待被所述应用任务访问的次数从多到少的顺序,依次执行每个所述热点数据块对应的应用任务。
  19. 根据权利要求15或16所述的资源调度装置,其特征在于,所述处理器还执行如下步骤:
    确定每个所述应用任务所要访问的热点数据块的个数;
    所述向所述热点数据块的本地节点发送移入指令之后还包括:
    按照要访问的热点数据块的个数从多到少的顺序,依次执行每个所述应用任务。
  20. 根据权利要求15至19中任一项所述的资源调度装置,其特征在于,所述处理器还执行如下步骤:
    确定每个内存中的数据块待被所述应用任务访问的次数;
    将待被所述应用任务访问的次数最少的前P个内存中的数据块确定为非热点数据块,所述P为预置数值,或,将待被所述应用任务访问的次数不大于Q的内存中的数据块确定为非热点数据块,所述Q为预置数值;
    向所述非热点数据块的本地节点发送移出指令,所述移出指令用于指示将所述非热点数据块从内存中移出。
  21. 根据权利要求15至20中任一项所述的资源调度装置,其特征在于,所述处理器还执行如下步骤:
    接收预置时间段内,客户端下发的待执行的工作的执行指令;
    将所述待执行的工作划分为多个待执行的应用任务,并将所述多个待执行的应用任务的集合确定为当前的任务队列。
PCT/CN2014/094581 2014-12-23 2014-12-23 一种资源调度方法以及相关装置 WO2016101115A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201480077812.4A CN106462360B (zh) 2014-12-23 2014-12-23 一种资源调度方法以及相关装置
PCT/CN2014/094581 WO2016101115A1 (zh) 2014-12-23 2014-12-23 一种资源调度方法以及相关装置
EP14908686.0A EP3200083B1 (en) 2014-12-23 2014-12-23 Resource scheduling method and related apparatus
US15/584,661 US10430237B2 (en) 2014-12-23 2017-05-02 Resource scheduling method and related apparatus
US16/558,983 US11194623B2 (en) 2014-12-23 2019-09-03 Resource scheduling method and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/094581 WO2016101115A1 (zh) 2014-12-23 2014-12-23 一种资源调度方法以及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/584,661 Continuation US10430237B2 (en) 2014-12-23 2017-05-02 Resource scheduling method and related apparatus

Publications (1)

Publication Number Publication Date
WO2016101115A1 true WO2016101115A1 (zh) 2016-06-30

Family

ID=56148859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094581 WO2016101115A1 (zh) 2014-12-23 2014-12-23 一种资源调度方法以及相关装置

Country Status (4)

Country Link
US (2) US10430237B2 (zh)
EP (1) EP3200083B1 (zh)
CN (1) CN106462360B (zh)
WO (1) WO2016101115A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463754A (zh) * 2020-11-25 2021-03-09 上海哔哩哔哩科技有限公司 Hdfs中数据节点切换方法、装置及计算机设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105511942B (zh) * 2015-12-02 2019-02-19 华为技术有限公司 语言虚拟机中热点中间代码的识别方法以及装置
CN113590343A (zh) * 2020-04-30 2021-11-02 海南掌上能量传媒有限公司 一种解决信息占比不均的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030191906A1 (en) * 2002-04-09 2003-10-09 Via Technologies, Inc. Data-maintenance method of distributed shared memory system
CN102508872A (zh) * 2011-10-12 2012-06-20 恒生电子股份有限公司 一种基于内存的联机处理系统的数据处理方法及系统
CN103440207A (zh) * 2013-07-31 2013-12-11 北京智谷睿拓技术服务有限公司 缓存方法及装置
CN103838681A (zh) * 2012-11-27 2014-06-04 联想(北京)有限公司 存储装置和数据文件存取方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6910106B2 (en) * 2002-10-04 2005-06-21 Microsoft Corporation Methods and mechanisms for proactive memory management
US8055689B1 (en) * 2005-03-14 2011-11-08 Oracle America, Inc. Methods and systems for distributing information model nodes in memory
US8307128B2 (en) * 2006-12-08 2012-11-06 International Business Machines Corporation System and method to improve sequential serial attached small computer system interface storage device performance
CN101431475B (zh) * 2008-11-20 2011-03-23 季鹏程 高性能的流媒体服务器的设置以及进行高性能节目读取的方法
US8869151B2 (en) * 2010-05-18 2014-10-21 Lsi Corporation Packet draining from a scheduling hierarchy in a traffic manager of a network processor
US8438361B2 (en) * 2010-03-10 2013-05-07 Seagate Technology Llc Logical block storage in a storage device
US8601486B2 (en) * 2011-05-31 2013-12-03 International Business Machines Corporation Deterministic parallelization through atomic task computation
CN103186350B (zh) * 2011-12-31 2016-03-30 北京快网科技有限公司 混合存储系统及热点数据块的迁移方法
US20130212584A1 (en) * 2012-02-09 2013-08-15 Robert Bosch Gmbh Method for distributed caching and scheduling for shared nothing computer frameworks
US9552297B2 (en) * 2013-03-04 2017-01-24 Dot Hill Systems Corporation Method and apparatus for efficient cache read ahead
CN102902593B (zh) * 2012-09-28 2016-05-25 方正国际软件有限公司 基于缓存机制的协议分发处理系统
US9292228B2 (en) * 2013-02-06 2016-03-22 Avago Technologies General Ip (Singapore) Pte. Ltd. Selective raid protection for cache memory
US9076530B2 (en) * 2013-02-07 2015-07-07 Seagate Technology Llc Non-volatile write buffer data retention pending scheduled verification
RU2538920C2 (ru) * 2013-05-06 2015-01-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Способ распределения задач сервером вычислительной системы, машиночитаемый носитель информации и система для реализации способа
CN110825324B (zh) * 2013-11-27 2023-05-30 北京奥星贝斯科技有限公司 混合存储的控制方法及混合存储系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030191906A1 (en) * 2002-04-09 2003-10-09 Via Technologies, Inc. Data-maintenance method of distributed shared memory system
CN102508872A (zh) * 2011-10-12 2012-06-20 恒生电子股份有限公司 一种基于内存的联机处理系统的数据处理方法及系统
CN103838681A (zh) * 2012-11-27 2014-06-04 联想(北京)有限公司 存储装置和数据文件存取方法
CN103440207A (zh) * 2013-07-31 2013-12-11 北京智谷睿拓技术服务有限公司 缓存方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463754A (zh) * 2020-11-25 2021-03-09 上海哔哩哔哩科技有限公司 Hdfs中数据节点切换方法、装置及计算机设备

Also Published As

Publication number Publication date
EP3200083A1 (en) 2017-08-02
EP3200083A4 (en) 2017-11-15
US20170235602A1 (en) 2017-08-17
US10430237B2 (en) 2019-10-01
CN106462360B (zh) 2019-10-25
EP3200083B1 (en) 2019-03-13
US11194623B2 (en) 2021-12-07
US20190391847A1 (en) 2019-12-26
CN106462360A (zh) 2017-02-22

Similar Documents

Publication Publication Date Title
US9331943B2 (en) Asynchronous scheduling informed by job characteristics and anticipatory provisioning of data for real-time, parallel processing
US11275622B2 (en) Utilizing accelerators to accelerate data analytic workloads in disaggregated systems
CN105045607B (zh) 一种实现多种大数据计算框架统一接口的方法
CN106406987B (zh) 一种集群中的任务执行方法及装置
US20170185648A1 (en) Optimizing skewed joins in big data
US9449018B1 (en) File operation task optimization
US20180248934A1 (en) Method and System for a Scheduled Map Executor
JP6886964B2 (ja) 負荷平衡方法及び装置
WO2019001017A1 (zh) 集群间数据迁移方法、系统、服务器及计算机存储介质
CN104750690A (zh) 一种查询处理方法、装置及系统
US9836516B2 (en) Parallel scanners for log based replication
CN103412786A (zh) 一种高性能服务器架构系统及数据处理方法
US11194623B2 (en) Resource scheduling method and related apparatus
WO2022126863A1 (zh) 一种基于读写分离及自动伸缩的云编排系统及方法
US10241828B2 (en) Method and system for scheduling transactions in a data system
Petrov et al. Adaptive performance model for dynamic scaling Apache Spark Streaming
CN104461710A (zh) 任务处理方法及装置
CN105740249B (zh) 一种大数据作业并行调度过程中的处理方法及其系统
JP2015106219A (ja) 分散型データ仮想化システム、クエリ処理方法及びクエリ処理プログラム
Ehsan et al. Cost-efficient tasks and data co-scheduling with affordhadoop
Thaha et al. Hadoop in openstack: Data-location-aware cluster provisioning
Kaur et al. Image processing on multinode hadoop cluster
US9110823B2 (en) Adaptive and prioritized replication scheduling in storage clusters
US10346371B2 (en) Data processing system, database management system, and data processing method
US11388050B2 (en) Accelerating machine learning and profiling over a network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14908686

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2014908686

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE