WO2017045553A1 - 一种任务分配方法和系统 - Google Patents

一种任务分配方法和系统 Download PDF

Info

Publication number
WO2017045553A1
WO2017045553A1 PCT/CN2016/098187 CN2016098187W WO2017045553A1 WO 2017045553 A1 WO2017045553 A1 WO 2017045553A1 CN 2016098187 W CN2016098187 W CN 2016098187W WO 2017045553 A1 WO2017045553 A1 WO 2017045553A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
response time
processing
overhead information
time
Prior art date
Application number
PCT/CN2016/098187
Other languages
English (en)
French (fr)
Inventor
周祥
吉剑南
占超群
潘岳
廖宇俊
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017045553A1 publication Critical patent/WO2017045553A1/zh
Priority to US15/922,805 priority Critical patent/US10936364B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5019Ensuring fulfilment of SLA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5012Processor sets

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a task assignment method and a task assignment system.
  • a real-time computing system usually referred to as a system with real-time or near real-time computing response time (RT) in a distributed environment.
  • RT real-time computing response time
  • real-time computing systems typically require higher computational response times, as quickly as possible.
  • the platform does not only serve one external application.
  • the faster response means that the computing load of the cluster system is larger in unit time; even in the In a single business service sub-cluster that is physically isolated, the faster the response during peak access periods, the greater the computational pressure on the cluster system.
  • the object is often reversed, that is, the fast response will also bring memory resource bottlenecks, resource garbage collection overhead, network bandwidth full, system resource preemption and many other problems, but the system response time becomes longer or even stuck or timed out.
  • the real-time computing system has time constraints, so if the response time delay of the task is too long, the time constraint is exceeded, causing problems such as task processing failure or system error.
  • the technical problem to be solved by the embodiments of the present application is to provide a task allocation method to stabilize the response delay.
  • the embodiment of the present application further provides a task distribution system for ensuring implementation and application of the foregoing method.
  • the present application discloses a task allocation method, including: analyzing at least one query mode according to a target task, acquiring an expected response time of the query mode, and estimating a system overhead according to the query mode and service description information. Information and estimated response time, and estimating node overhead information processed by each processing node of the system; selecting a processing node according to the node overhead information, and assigning the subtask in the target task to the selected processing section Point; determine remaining subtasks that are not allocated by the target task, and schedule the remaining subtasks according to the expected response time, system overhead information, and estimated response time.
  • estimating the system overhead information and the estimated response time according to the query mode and the service description information including: reading service description information of the database table corresponding to each query mode, where the service description information is used to describe the system The service data is obtained, and the amount of data corresponding to each of the query modes is estimated according to the service description information; the system overhead information and the estimated response time are estimated according to the data amount, wherein the system overhead information includes at least one of the following: processor overhead Information, input and output overhead information, and network overhead information.
  • the method before the estimating the node overhead information processed by the processing node, the method further includes: pre-delivering the subtasks in the target task to each processing node in the system.
  • the estimating the node cost information processed by each processing node of the system includes: determining, for each processing node, service description information according to the query mode corresponding database table of the subtask; and estimating according to the service description information
  • the processing node processes node overhead information of the subtask.
  • the processing node is selected according to the node overhead information, and the subtask in the target task is allocated to the selected processing node, including: determining whether the node overhead information of each processing node is lower than a node threshold; When the node overhead information of the node is lower than the node threshold, the pre-delivered subtask is allocated to the processing node.
  • the scheduling the remaining sub-tasks according to the expected response time, the system overhead information, and the estimated response time including: calculating a total processing time corresponding to the allocated sub-tasks; according to the expected response time and system overhead
  • the information, the estimated response time, and the total processing time are used to estimate the delay time of the remaining subtasks; after delaying the remaining subtasks according to the delay time, the processing nodes are allocated for the remaining subtasks.
  • the estimating the delay time of the remaining sub-tasks according to the expected response time, the system overhead information, the estimated response time, and the total processing time including: estimating, according to the estimated response time and system overhead information, Processing time of the remaining subtasks; calculating a delay time of the remaining subtasks according to the expected response time and the processing time of the remaining subtasks.
  • allocating processing nodes for the remaining subtasks including: after the allocated subtasks are processed, starting to calculate a delay time; At the time, the processing node is allocated for the remaining subtasks.
  • the method further includes: after the total processing time is reached, if the allocated subtasks are not processed, the subtasks are reclaimed; and the reclaimed subtasks are scheduled as the remaining subtasks.
  • the embodiment of the present application further discloses a task allocation (scheduling) system, including: a query mode analysis module, The at least one query mode is analyzed according to the target task, and the expected response time of the query mode is obtained; the consumption and response time analysis module is configured to estimate system overhead information and estimated response time according to the query mode and the service description information, and Estimating the node overhead information processed by each processing node of the system; the query task scheduling module is configured to select the processing node according to the node overhead information, assign the subtask in the target task to the selected processing node; and determine the remaining unallocated target task The subtasks schedule the remaining subtasks according to the expected response time, system overhead information, and estimated response time.
  • a task allocation (scheduling) system including: a query mode analysis module, The at least one query mode is analyzed according to the target task, and the expected response time of the query mode is obtained; the consumption and response time analysis module is configured to estimate system overhead information and estimated response time according to the query mode and the service description information,
  • the consumption and response time analysis module includes: a read sub-module, configured to read service description information of a database table corresponding to each query mode, where the service description information is used to describe service data of the system; a data quantity estimation sub-module, configured to estimate, according to the service description information, a data quantity corresponding to each query mode scan; and an overhead and response estimation sub-module, configured to estimate system overhead information and an estimated response time according to the data quantity, where
  • the system overhead information includes at least one of the following: processor overhead information, input and output overhead information, and network overhead information.
  • the query task scheduling module is further configured to pre-deliver sub-tasks in the target task to each processing node in the system.
  • the consumption and response time analysis module is configured to determine, according to the query mode corresponding database table of the subtask, service description information for each processing node; and according to the service description information, estimate the processing node processing Node overhead information of the subtask.
  • the query task scheduling module includes: a first allocation submodule, configured to determine, respectively, whether node cost information of each processing node is lower than a node threshold; when the node overhead information of the processing node is lower than a node threshold The pre-delivered subtask is assigned to the processing node.
  • the query task scheduling module includes: a delay calculation sub-module, configured to calculate a total processing time corresponding to the allocated sub-task; according to the expected response time, system overhead information, estimated response time, and total processing time Estimating the delay time of the remaining subtasks; the second allocation submodule is configured to allocate processing nodes for the remaining subtasks after delaying the remaining subtasks according to the delay time.
  • a delay calculation sub-module configured to calculate a total processing time corresponding to the allocated sub-task; according to the expected response time, system overhead information, estimated response time, and total processing time Estimating the delay time of the remaining subtasks
  • the second allocation submodule is configured to allocate processing nodes for the remaining subtasks after delaying the remaining subtasks according to the delay time.
  • the delay calculation sub-module is configured to estimate processing time of the remaining sub-tasks according to the estimated response time and system overhead information; and calculate, according to the expected response time and processing time of the remaining sub-tasks The delay time of the remaining subtasks.
  • the second allocation submodule is configured to start calculating a delay time after the allocated subtask is processed; and when the delay time is reached, processing the remaining subtask processing node.
  • the method further includes: a long tail processing module, if after the total processing time is reached, if there is an assigned child After the processing is not completed, the subtask is reclaimed; the task scheduling module is also queried, and the reclaimed subtask is also used as the remaining subtasks for scheduling.
  • the embodiments of the present application include the following advantages:
  • the query mode and its corresponding expected response time may be determined, and the estimated response time is used as a reference, and the system is estimated according to the query mode and the service description information.
  • the overhead information and the estimated response time, and estimating the node overhead information processed by each processing node of the system assigning the subtasks in the target task to the selected processing node, determining the remaining subtasks that are not allocated by the target task, according to the expected response time, the system
  • the overhead information and the estimated response time are used to schedule the remaining subtasks to batch process the subtasks in the target task within a desired response time, reduce system overhead on the basis of satisfying the expected response time, and prevent the system from being overhead.
  • the problem is that the response time becomes longer or stuck, and the delay time of the stable response.
  • 1 is a flow chart showing the steps of an embodiment of a task assignment method of the present application
  • FIG. 2 is a structural block diagram of a real-time computing system according to an embodiment of the present application.
  • 3 is a flow chart showing the steps of another embodiment of the task assignment method of the present application.
  • FIG. 5 is a structural block diagram of an embodiment of a task distribution system of the present application.
  • FIG. 6 is a structural block diagram of another embodiment of a task distribution system of the present application.
  • the response delay is also called the response delay time. Because the real-time computing system has time constraints, if the response time delay of the task is too long and exceeds the time constraint of the real-time computing system, it will cause problems such as task processing failure or system error. Therefore, one of the core concepts of the embodiments of the present application is to provide a task allocation method and system to stabilize the response delay. Wherein, when the real-time computing system is used to process the target task, the query mode and its corresponding expected response time may be determined, and the system overhead information and the estimated response time are estimated according to the query mode and the service description information based on the expected response time.
  • the remaining subtasks are scheduled to batch process the subtasks in the target task within the expected response time, and reduce the system overhead on the basis of satisfying the expected response time. To prevent the system from becoming longer or stuck due to overhead and other issues, and to stabilize the response delay time.
  • a flow chart of steps of an embodiment of a task assignment method (also referred to as a task scheduling method) of the present application is shown. Specifically, the method may include the following steps:
  • Step 102 Analyze at least one query mode according to the target task, and obtain an expected response time of the query mode.
  • each target task may be composed of multiple sub-tasks, so that The system is capable of processing multiple subtasks simultaneously using multiple processing nodes.
  • the target task may be a query task for the database, so the query mode of the target task may be analyzed, wherein different subtasks may have the same or different query modes, so at least one query mode may be analyzed for the target task, and the query mode Including: local data filtering (local filtering), index filtering (index filtering), range query, multi-value column query.
  • local data filtering local filtering
  • index filtering index filtering
  • range query multi-value column query
  • the service level agreement (SLA) of each query mode is determined, wherein the expected response time SLA is agreed upon by the user based on a series of factors, including: data volume, query mode, and use.
  • SLA can not be an exact value.
  • a value RT_a is obtained according to the real-time computing system history and the same type of query.
  • the user obtains an acceptable expected value RT_b according to its business system, where RT_b ⁇ RT_a, the RT_b can be used as the SLA, and RT_a is the system.
  • the response time of the expected response delay capability of the same type of query is the service level agreement (SLA) of each query mode.
  • the expected response time of the service is greater than the expected response delay capability of the same type of query of the system. If RT_b ⁇ RT_a, the currently set cluster resource allocation (number of devices, number of partitions) cannot meet the service requirements, and the SLA needs to be re-assigned with the user, and the resources (number of devices, number of partitions) are allocated to reduce RT_a until both parties In case of agreement, RT_b ⁇ RT_a is satisfied.
  • the user uses 50 computing nodes to load the 1T data into 100 partitions, each computing node has 2 partitions.
  • the overall data filtering rate is 1/1000000.
  • the system response delay capability RT_a is 100ms, and the expected value RT_b that the user service system can accept is 200ms. This RT_b>RT_a, the SLA of this query service is agreed to be 200ms.
  • Step 104 Estimate system overhead information and estimated response time according to the query mode and service description information, and estimate node overhead information processed by each processing node of the system.
  • the service description information is statistical information related to the service, including, for example, the cardinality of the service, the frequency of the specific column value, the histogram, and the like.
  • the system overhead information and the estimated response time of the execution task can be estimated. It is also possible to estimate the node overhead information that the processing node processes after each subtask is sent to each processing node in the system.
  • the overhead information refers to resource consumption information required for processing tasks of the device in the system, such as CPU consumption, IO consumption, and network consumption.
  • Step 106 Select a processing node according to the node overhead information, and assign a subtask in the target task to the selected processing node.
  • the processing node may be selected according to the node overhead information, such as selecting a processing node whose node overhead information reaches the processing condition, and then assigning the subtask in the target task to the selection.
  • the processing node uses the processing node to process the subtask.
  • Step 108 Determine remaining subtasks that are not allocated by the target task, and schedule the remaining subtasks according to the expected response time, system overhead information, and estimated response time.
  • the remaining subtasks that are not allocated in the target task are determined according to the assigned subtasks, and then the remaining time is calculated according to the expected response time, system overhead information, and estimated response time.
  • the processing time of the subtask, and then processing the remaining subtasks according to the processing time For example, after delaying for a period of time, the processing nodes are allocated for the remaining subtasks.
  • the query mode and its corresponding expected response time can be determined, and the system overhead information is estimated according to the query mode and the service description information based on the expected response time. And estimating the response time, and estimating the node overhead information processed by each processing node of the system, assigning the subtasks in the target task to the selected processing node, determining the remaining subtasks that are not allocated by the target task, according to the expected response time and system overhead information.
  • this embodiment details the task assignment steps of the real-time computing system.
  • the real-time computing system includes: SLA Metadata, Query Pattern Analyzer, Statistics, Cost and RT Analyzer, and Query Task. Scheduler), Tail Tolerance Scheduler, System Status Monitor and Computer Node Cluster.
  • the Metadata Module (SLA Metadata) is used to store the SLA of the response time of a particular query pattern in all business-related real-time query data workloads.
  • SLA Metadata The Metadata Module
  • the Query Pattern Analyzer is used to perform fast pattern analysis on the received query.
  • the main query modes include local filtering, index filtering, hash join, subquery, range query, Multi-valued column queries, etc.
  • the RT SLA of the service for this type of query mode can be read from the SLA Metadata as input to the Query Task Scheduler module.
  • the query mode is also used as input to the Cost and RT Analyzer module for overhead and RT estimation.
  • the statistics information module (Statistics) is used to store service description information describing the data of the service system.
  • the service description information includes: cardinality, frequency of a specific column value, histogram of a histogram, etc., and the service description information may be updated periodically or in real time. Reflect current business data.
  • the cost and response time analysis module (Cost and RT Analyzer) is used to analyze the overhead information and the estimated response time.
  • the overhead information includes: system overhead information and node overhead information, and the Cost and RT Analyzer module reads the statistical information to query the query. Costs such as central processing unit (CPU) overhead, input/output (IO) overhead, network overhead, etc., and RT estimation.
  • the Query Task Scheduler is used to schedule the target task, that is, to execute the scheduling decision of the sub-task, and can read the query mode, the estimated query cost, the RT, and the system running status information, and perform sub-tasks on the query. Scheduling.
  • the computing task of the distributed real-time computing system is generally composed of a number of parallel sub-tasks. Therefore, the Query Task Scheduler module also coordinates the Tail Tolerance Scheduler that has been implemented, and long tails when appropriate. deal with.
  • the Tail Tolerance Scheduler is used to perform long tail processing on subtasks in the system.
  • the System Status Monitor is used to collect the system running status of the processing node in real time.
  • a computer node cluster is used for query processing of subtasks, wherein the cluster of computing nodes includes a plurality of processing nodes.
  • the compute node cluster is configured to receive the subtask request sent by the server and return the calculation result.
  • the distributed computing node cluster is composed of multiple servers, usually divided into multiple copies. Each of these copies contains complete data. Each copy is divided into multiple partitions, each partition contains part of the data, Zone queries can be executed in parallel.
  • one server (computing node) can store one or more partition data.
  • the subtask can be reasonably scheduled on the basis of satisfying the agreed expected response time SLA, and the response time of the real-time computing system is stabilized.
  • Specific steps are as follows:
  • FIG. 3 a flow chart of steps of another embodiment of the task assignment method of the present application is shown, which may specifically include the following steps:
  • Step 302 Analyze at least one query mode according to the target task.
  • the database table is queried according to the target task, and the query mode of the corresponding column of each database table is determined.
  • the query statement for the user database db1 is as follows:
  • the corresponding query mode can be analyzed from the database corresponding database table through the corresponding query statement.
  • Step 304 Acquire an expected response time of the query mode.
  • the expected response time of each query mode is read in the SLA Metadata, and the total expected response time of the target task can be obtained.
  • the above example is read from the user SLA Metadata, and the response latency RT of the database db1 containing the above six query modes is 400 ms.
  • Step 306 The subtasks in the target task are pre-delivered to each processing node in the system.
  • the task delay is affected.
  • the sub-tasks can be pre-assigned to each processing node in the system, so as to calculate the response time and consumption of each sub-task of each processing node, so as to determine whether to allocate sub-tasks.
  • Step 308 Read service description information of the database table corresponding to each query mode.
  • Step 310 Estimate the amount of data corresponding to each query mode according to the service description information.
  • Step 312 estimating the overhead information and estimating the response time according to the data amount.
  • the Cost and RT Analyzer reads the service description information of the database table corresponding to each query mode from Statistics, such as the cardinality of the data, the frequency of the specific column value, and the histogram of the histogram. Then, according to the service description information, the amount of data corresponding to each query mode is estimated, that is, the number of data to be scanned, and then the CPU unit and the IO unit, and the network overhead unit are combined to estimate the overhead information and the RT value.
  • the overhead information includes system overhead information and node overhead information, that is, the system overhead information, the RT value, and the node overhead information and the RT value required for each processing node to process the subtask are estimated by the above process.
  • the query cost and RT estimation can adopt the following formula:
  • the user's target task corresponds to the query mode, which includes the following six specific query mode calculations, and is numbered separately:
  • the service description information of the database table corresponding to each query mode is read from Statistics, and the service description information for the query mode is as follows:
  • the cardinality of the column col1 of the tab1 is 5;
  • the cardinality of the column col2 of the table tab2 is 8;
  • the cardinality of the column col4 (multi-value column) of table tab2 is 1000000;
  • the cardinality of the column col5 of the table tab2 is 10000;
  • column col5 of table tab2 is greater than 100000, accounting for 20% of the total number of records
  • the cardinality of the column col2 of the tab3 is 10;
  • Step 314 Determine whether the node overhead information of each processing node is lower than a node threshold.
  • the query task scheduling module can read the query mode, the estimated overhead information, and the RT value, and read the system running status information from the System Status Monitor for subtask scheduling.
  • the node threshold configured in the system is used to measure the processing capability of the processing node to the task. Therefore, it can be determined whether the node overhead information of each processing node is lower than the node threshold, thereby determining whether it can be a processing node. Assign subtasks.
  • step 316 is performed; if not, that is, the node overhead information of the processing node is not lower than the node threshold, step 322 is performed.
  • Step 316 the pre-delivered subtask is assigned to the processing node.
  • the node overhead information of the processing node is lower than the node threshold, and the processing node has the capability of processing the subtask in real time, and allocates the pre-delivered subtask to the processing node, and of course, other subtasks can also be allocated.
  • the subtasks are processed by the processing node.
  • step 318 it is determined whether long tail processing is performed.
  • the processing node After the subtask is assigned to the processing node for processing, the processing node needs to process the task within the specified response time within a predetermined time, and the expected response time may be the time of the subtask, or the subtask that is simultaneously delivered. Total time. If the subtask is not processed within the specified time, the long tail processing can be triggered, such as re-issuing the subtask to reprocess.
  • step 320 is performed; if not, the long tail processing is not performed, and step 322 is performed.
  • step 320 the subtask is recovered.
  • the long tail processing can be performed at this time, that is, the subtasks are reclaimed and the processing is re-issued.
  • the recycled subtasks are added to the remaining subtasks.
  • the remaining sub-tasks are scheduled according to the expected response time, the system overhead information, and the estimated response time, including: the total processing time corresponding to the statistically allocated sub-tasks; and the expected response time and system overhead information. Estimating the response time and the total processing time, estimating a delay time of the remaining subtasks; and delaying the remaining subtasks according to the delay time, allocating processing nodes for the remaining subtasks.
  • step 322 the delay time of the remaining subtasks is estimated.
  • step 324 the delay time is calculated.
  • Step 326 when the delay time is reached, the processing node is allocated for the remaining subtasks for processing.
  • the estimating the delay time of the remaining subtasks according to the expected response time, the system overhead information, the estimated response time, and the total processing time including: according to the estimated response time, The system overhead information estimates a processing time of the remaining subtasks; and calculates a delay time of the remaining subtasks according to the expected response time and the processing time of the remaining subtasks.
  • allocating processing nodes for the remaining subtasks includes: after the allocated subtasks are processed, starting to calculate the delay Time; when the delay time is reached, the processing node is allocated for the remaining subtasks for processing.
  • the delay processing may be performed for the remaining subtasks, that is, the total processing time corresponding to the assigned subtasks may be first counted, for example, the total processing time is estimated according to the expected response time of each subtask. And estimating the delay time of the remaining subtasks according to the expected response time, the system overhead information, the estimated response time, and the total processing time, that is, estimating the processing time of the remaining subtasks according to the estimated response time and the system overhead information. Then, according to the total expected response time of the target task, the processing time of the remaining subtasks, and the total processing time corresponding to the assigned subtasks, the delay time of the remaining subtasks is estimated.
  • the delay time is calculated, for example, when a timer is started, when the delay time is reached, the processing nodes are allocated for processing the remaining subtasks, for example, The remaining subtasks are allocated to the processing nodes that were previously delivered, or other processing nodes that are relatively idle.
  • the target task has a total of 100 sub-tasks.
  • the cost estimation it is determined that the node overhead information of 60 processing nodes is lower than the node threshold.
  • 60 sub-tasks are sent to the corresponding computing nodes first, and the total processing time is 50ms.
  • the running state, data volume and task cost of the computing node system of the allocated 60 subtasks are evaluated, and compared with the running state, data volume and task cost of the remaining 40 subtasks, and the estimated processing time of 40 subtasks is About 100ms, where the processing time can be estimated by linear relationship.
  • the comparison simulates the system response time comparison curve of the existing real-time computing system, the system using only the long tail processing technology, and the cluster access growth of the embodiment of the present application, without the horizontal expansion of the cluster system. .
  • the dotted line "-" is used to indicate the existing real-time computing system.
  • the average RT gradually increases.
  • the RT rises significantly, eventually causing the system to become stuck.
  • the dot-dash line “— ⁇ —” is used to indicate the long tail processing system.
  • the average response RT is relatively stable within the bottleneck threshold of concurrent access, but the system bottleneck threshold is not improved, even due to the retransmission strategy, the system is earlier. Reach the bottleneck threshold.
  • the embodiment of the present application provides a new service access SLA mode technology, which can enable the system to implement a real-time computing request stable response delay and improve the system bottleneck threshold as long as the service satisfies the real-time computing response SLA. It mainly includes two steps: 1) query overhead information and response delay estimation based on query mode and service description information; 2) online scheduling of computing tasks based on service SLA.
  • the fast pattern matching analysis of the query can estimate the approximate overhead information (such as CPU overhead, IO overhead, etc.) and The response delay is used for subsequent online calculation of scheduling related decisions.
  • the real-time query computing service is different from the offline data warehouse.
  • the query mode included in the query workload is generally known in advance, and the response delay is required to be in the order of milliseconds (several ms to hundreds of ms), or within tens of seconds.
  • the response and overhead of these queries can be estimated relatively accurately, so that the response delay SLA of a different query mode can be agreed with the user (similar query, usually the longest response time in the case of resource configuration requested by the user)
  • the system Based on the SLA, the system performs flexible online computing task scheduling, and can combine long tail processing technology to optimize resource utilization and increase system bottleneck threshold to achieve stable response delay.
  • the embodiment further provides a task allocation (or may also be referred to as scheduling) system.
  • FIG. 5 a structural block diagram of an embodiment of a task distribution system of the present application is shown, which may specifically include the following modules:
  • the query mode analysis module 502 is configured to analyze at least one query mode according to the target task, and obtain an expected response time of the query mode.
  • the consumption and response time analysis module 504 is configured to estimate system overhead information and estimated response time according to the query mode and the service description information, and estimate node overhead information processed by each processing node of the system.
  • the query task scheduling module 506 is configured to select a processing node according to the node overhead information, assign a subtask in the target task to the selected processing node, and determine remaining subtasks that are not allocated by the target task, according to the expected response time and system overhead information. And estimating the response time, scheduling the remaining subtasks.
  • the task allocation (scheduling) system can be regarded as a subsystem of the real-time computing system, and the query mode analyzing module 502 can obtain the expected response time of the query mode from the metadata module, and each module in the task scheduling system
  • the connection relationship of other modules in the real-time computing system is as shown in FIG. 4 and corresponding embodiments, and therefore will not be described again.
  • the query mode and its corresponding expected response time can be determined, and the system overhead information is estimated according to the query mode and the service description information based on the expected response time. And estimating the response time, and estimating the node overhead information processed by each processing node of the system, assigning the subtasks in the target task to the selected processing node, determining the remaining subtasks that are not allocated by the target task, according to the expected response time and system overhead information.
  • FIG. 6 a structural block diagram of another embodiment of a task scheduling system of the present application is shown, which may specifically include the following modules:
  • the query mode analysis module 602 is configured to analyze at least one query mode according to the target task, and obtain an expected response time of the query mode.
  • the consumption and response time analysis module 604 is configured to estimate system overhead information and estimated response time according to the query mode and the service description information, and estimate node overhead information processed by each processing node of the system.
  • the query task scheduling module 606 is configured to select a processing node according to the node overhead information, assign a subtask in the target task to the selected processing node, and determine remaining subtasks that are not allocated by the target task, according to the expected response time and system overhead information. And estimating the response time, scheduling the remaining subtasks.
  • the long tail processing module 608 is configured to: after the total processing time is reached, if the allocated subtask is not processed, the subtask is recovered; the query task scheduling module 606 is further configured to use the reclaimed subtask as the remaining subtask Tasks are scheduled.
  • the consumption and response time analysis module 604 includes: a reading sub-module 60402, configured to read service description information of each database corresponding to the query mode, where the service description information is used to describe service data of the system;
  • the data amount estimation sub-module 60404 is configured to estimate, according to the service description information, an amount of data corresponding to each query mode scan;
  • the cost and response estimation sub-module 60406, configured to estimate system overhead information and estimate response time according to the data amount,
  • the system overhead information includes at least one of the following: processor overhead information, input and output overhead information, and network overhead information.
  • the query task scheduling module 606 is further configured to pre-deliver sub-tasks in the target task to each processing node in the system.
  • the consumption and response time analysis module 604 is configured to determine, according to the query mode corresponding database table of the subtask, service description information for each processing node; and according to the service description information, estimate, by the processing node, the child Node overhead information for the task.
  • the query task scheduling module 606 includes a first allocation sub-module 60602, a delay calculation sub-module 60604, and a second distribution sub-module 60606.
  • the first allocation sub-module 60602 is configured to determine, respectively, whether the node overhead information of each processing node is lower than a node threshold; when the node overhead information of the processing node is lower than the node threshold, the pre-delivered sub-task is allocated to the The processing node.
  • the delay calculation sub-module 60604 is configured to calculate a total processing time corresponding to the allocated sub-task; and estimate a delay time of the remaining sub-task according to the expected response time, system overhead information, estimated response time, and total processing time.
  • the second allocation sub-module 60606 is configured to allocate a processing node for the remaining sub-tasks after delaying the remaining sub-tasks according to the delay time.
  • the delay calculation sub-module 60604 is configured to estimate a processing time of the remaining sub-tasks according to the estimated response time and system overhead information; and calculate the remaining according to the expected response time and the processing time of the remaining sub-tasks The delay time of the subtask.
  • the second allocation sub-module 60606 is configured to start calculating a delay time after the allocated sub-tasks are processed; and when the delay time is reached, processing the remaining sub-task allocation processing nodes.
  • the embodiment of the present application provides a new service access SLA mode technology, which can enable the system to implement a real-time computing request stable response delay and improve the system bottleneck threshold as long as the service satisfies the real-time computing response SLA. It mainly includes two steps: 1) query overhead information and response delay estimation based on query mode and service description information; 2) online scheduling of computing tasks based on service SLA.
  • the fast pattern matching analysis of the query can estimate the approximate overhead information (such as CPU overhead, IO overhead, etc.) and The response delay is used for subsequent online calculation of scheduling related decisions.
  • the real-time query computing service is different from the offline data warehouse.
  • the query mode included in the query workload is generally known in advance, and the response delay is required to be in the order of milliseconds (several ms to hundreds of ms), or within tens of seconds.
  • the response and overhead of these queries can be estimated relatively accurately, so that the response delay SLA of a different query mode can be agreed with the user (similar query, usually the longest response time in the case of resource configuration requested by the user)
  • the system Based on the SLA, the system performs flexible online computing task scheduling, and can combine long tail processing technology to optimize resource utilization and increase system bottleneck threshold to achieve stable response delay.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-permanent, removable and non-removable The media can be stored by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种任务分配方法和系统,以稳定响应时延。所述的方法包括:依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间(102);依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息(104);按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点(106);确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度(108)。在期望响应时间内对目标任务中的子任务分批处理,在满足期望响应时间的基础上减少系统开销,防止系统由于开销等问题使得响应时间变长或卡死,稳定响应时间。

Description

一种任务分配方法和系统
本申请要求2015年09月15日递交的申请号为201510587698.6、发明名称为“一种任务分配方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别是涉及一种任务分配方法和一种任务分配系统。
背景技术
实时计算系统,通常指的是分布式环境下的具有实时或近实时计算响应时间(Response Time,RT)的系统。
单一的从外部应用系统的角度来看,实时计算系统通常对于计算响应时间要求较高,越快越好。而从计算系统本身的角度来看,平台不单单只为一个外部应用服务,在同时为多个外部应用业务服务时,越快的响应意味着在单位时间内集群系统计算负荷越大;即使在做了物理隔离的单一业务服务子集群内,高峰访问时段下,越快的响应也为集群系统带来越大的计算压力。这种情况下,往往物极必反,即快速响应也会带来内存资源瓶颈、资源垃圾回收开销、网络带宽打满、系统资源抢占等诸多问题,反而使得系统响应时间变长甚至卡死或超时等现象。但是,实时计算系统具有时间约束条件,因此若是任务的响应时间延迟过长,超出了该时间约束条件,造成任务处理失败或系统出错等问题。
因此,目前需要本领域技术人员迫切解决的一个技术问题就是:提出一种任务分配方法和系统,以稳定响应时延。
发明内容
本申请实施例所要解决的技术问题是提供一种任务分配方法,以稳定响应时延。
相应的,本申请实施例还提供了一种任务分配系统,用以保证上述方法的实现及应用。
为了解决上述问题,本申请公开了一种任务分配方法,包括:依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间;依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息;按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节 点;确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
可选的,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,包括:读取各查询模式对应数据库表的业务描述信息,其中,所述业务描述信息用于描述系统的业务数据;依据所述业务描述信息估算每个查询模式对应扫描的数据量;依据所述数据量估算系统开销信息和估计响应时间,其中,所述系统开销信息包括以下至少一项:处理器开销信息、输入输出开销信息和网络开销信息。
可选的,所述估算系统每个处理节点处理的节点开销信息之前,还包括:将目标任务中的子任务预下发给系统中的各处理节点。
可选的,所述估算系统每个处理节点处理的节点开销信息,包括:针对每个处理节点,依据所述子任务的查询模式对应数据库表确定业务描述信息;依据所述业务描述信息,估算所述处理节点处理所述子任务的节点开销信息。
可选的,按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点,包括:分别判断每个处理节点的节点开销信息是否低于节点阈值;当所述处理节点的节点开销信息低于节点阈值时,将预下发的子任务分配给所述处理节点。
可选的,所述依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,包括:统计分配的子任务对应的总处理时间;按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间;按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点。
可选的,所述按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间,包括:依据所述估计响应时间、系统开销信息估算所述剩余子任务的处理时间;依据所述期望响应时间和所述剩余子任务的处理时间计算所述剩余子任务的延时时间。
可选的,按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点,包括:在分配的子任务处理完毕后,开始计算延时时间;在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
可选的,还包括:在达到总处理时间后,若存在已分配的子任务未处理完毕,则回收所述子任务;将回收的子任务作为剩余子任务进行调度。
本申请实施例还公开了一种任务分配(调度)系统,包括:查询模式分析模块,用 于依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间;消耗和响应时间分析模块,用于依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息;查询任务调度模块,用于按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点;确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
可选的,所述消耗和响应时间分析模块,包括:读取子模块,用于读取各查询模式对应数据库表的业务描述信息,其中,所述业务描述信息用于描述系统的业务数据;数据量估算子模块,用于依据所述业务描述信息估算每个查询模式对应扫描的数据量;开销和响应估算子模块,用于依据所述数据量估算系统开销信息和估计响应时间,其中,所述系统开销信息包括以下至少一项:处理器开销信息、输入输出开销信息和网络开销信息。
可选的,查询任务调度模块,还用于将目标任务中的子任务预下发给系统中的各处理节点。
可选的,所述消耗和响应时间分析模块,用于针对每个处理节点,依据所述子任务的查询模式对应数据库表确定业务描述信息;依据所述业务描述信息,估算所述处理节点处理所述子任务的节点开销信息。
可选的,所述查询任务调度模块,包括:第一分配子模块,用于分别判断每个处理节点的节点开销信息是否低于节点阈值;当所述处理节点的节点开销信息低于节点阈值时,将预下发的子任务分配给所述处理节点。
可选的,所述查询任务调度模块,包括:延时计算子模块,用于统计分配的子任务对应的总处理时间;按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间;第二分配子模块,用于按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点。
可选的,所述延时计算子模块,用于依据所述估计响应时间、系统开销信息估算所述剩余子任务的处理时间;依据所述期望响应时间和所述剩余子任务的处理时间计算所述剩余子任务的延时时间。
可选的,第二分配子模块,用于在分配的子任务处理完毕后,开始计算延时时间;在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
可选的,还包括:长尾处理模块,用于在达到总处理时间后,若存在已分配的子任 务未处理完毕,则回收所述子任务;查询任务调度模块,还用于将回收的子任务作为剩余子任务进行调度。
与现有技术相比,本申请实施例包括以下优点:
在本申请实施例中,在采用实时计算系统对目标任务进行处理时,可以确定查询模式及其对应的期望响应时间,以期望响应时间为基准,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息,将目标任务中的子任务分配给选择的处理节点,确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,从而在期望响应时间内对目标任务中的子任务分批处理,在满足期望响应时间的基础上减少系统开销,防止系统由于开销等问题使得响应时间变长或卡死,稳定响应的延时时间。
附图说明
图1是本申请的一种任务分配方法实施例的步骤流程图;
图2是本申请实施例的一种实时计算系统的结构框图;
图3是本申请的另一种任务分配方法实施例的步骤流程图;
图4是本申请实施例中不同集群系统的响应时间对比示意图;
图5是本申请一种任务分配系统实施例的结构框图;
图6是本申请另一种任务分配系统实施例的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
响应时延也称为响应延时时间,由于实时计算系统具有时间约束条件,若是任务的响应时间延迟过长,超出了实时计算系统的时间约束条件,就会造成任务处理失败或系统出错等问题,因此本申请实施例的核心构思之一在于,提供一种任务分配方法和系统,以稳定响应时延。其中,采用实时计算系统对目标任务进行处理时,可以确定查询模式及其对应的期望响应时间,以期望响应时间为基准,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息,将目标任务中的子任务分配给选择的处理节点,确定目标任务未分配的剩余子任务,依 据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,从而在期望响应时间内对目标任务中的子任务分批处理,在满足期望响应时间的基础上减少系统开销,防止系统由于开销等问题使得响应时间变长或卡死,稳定响应的延时时间。
实施例一
参照图1,示出了本申请的一种任务分配方法(也可以称为任务调度方法)实施例的步骤流程图,具体可以包括如下步骤:
步骤102,依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间。
本申请实施例可以应用于实时计算系统中,针对实时计算系统的目标任务进行处理时,为了能够满足实时性的要求,提高目标任务的处理速度,每个目标任务可以由多个子任务构成,使得系统能够采用多个处理节点同时处理多个子任务。
目标任务可以是对数据库的查询任务,因此可以分析该目标任务的查询模式,其中,不同的子任务可以具有相同或不同的查询模式,因此对于目标任务可以分析出至少一种查询模式,查询模式包括:本地数据过滤(local filtering)、索引查找(index filtering)、range查询、多值列查询等。
然后确定每个查询模式的期望响应时间SLA(Service Level Agreement),其中,期望响应时间SLA是业务接入时,与用户基于一系列因素评估商定的,这些因素包括:数据量、查询模式、使用的设备数量等。SLA可以不是一个精确值。通常是根据实时计算系统历史同类型查询估算得到一个值RT_a,同时,用户会根据其业务系统得到一个其能接受的期望值RT_b,其中,RT_b≥RT_a,该RT_b即可作为SLA,RT_a为系统的同类型查询的期望响应时延能力的响应时间。可见,业务的期望响应时间大于系统的同类型查询的期望响应时延能力。如果RT_b<RT_a,则当前设定的集群资源分配(设备数量、分区数)不能满足业务需求,需要重新和用户商定SLA,多分配资源(设备数量、分区数),来降低RT_a,直到在双方同意的情况下,满足RT_b≥RT_a。
例如,对于1T的数据量,用户使用50台计算节点,将这1T数据装载到100个分区中,每台计算节点2个分区,对于2张表join,查询总体数据过滤率在1/1000000时,系统响应时延能力RT_a为100ms,而用户业务系统能接受的期望值RT_b是200ms,这个RT_b>RT_a,这个查询业务的SLA商定为200ms。
步骤104,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息。
业务描述信息是与业务相关的统计信息,如包括:业务的数据量基数(cardinality)、特定列值的频率(frequency)、直方图(histogram)等。基于查询模式和业务描述信息可以估算出执行任务的系统开销信息和估计响应时间。还可以估算出每个子任务下发给系统中各处理节点后,处理节点对其进行处理的节点开销信息。其中,开销信息指的是系统中设备处理任务所需的资源消耗信息,如CPU消耗、IO消耗、网络消耗等。
步骤106,按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点。
确定出每个处理节点到处理子任务的节点开销信息后,可以按照该节点开销信息选择分配处理节点,如选择节点开销信息达到处理条件的处理节点,然后将目标任务中的子任务分配给选择的处理节点,采用该处理节点对子任务进行处理。
步骤108,确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
其中,实时计算系统通常难以同时对目标任务的子任务进行处理,因此按照已分配的子任务确定目标任务中未分配的剩余子任务,然后按照期望响应时间、系统开销信息和估计响应时间计算剩余子任务的处理时间,然后按照处理时间对剩余子任务进行处理,如延时一段时间之后再为剩余子任务分配处理节点进行处理。
综上所述,在采用实时计算系统对目标任务进行处理时,可以确定查询模式及其对应的期望响应时间,以期望响应时间为基准,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息,将目标任务中的子任务分配给选择的处理节点,确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,从而在期望响应时间内对目标任务中的子任务分批处理,在满足期望响应时间的基础上减少系统开销,防止系统由于开销等问题使得响应时间变长或卡死,稳定响应的延时时间。
实施例二
在上述实施例的基础上,本实施例详细论述实时计算系统的任务分配步骤。
其中,实时计算系统的一种结构框图如图2所示。
实时计算系统包括:元数据模块(SLA Metadata)、查询模式分析模块(Query Pattern Analyzer)、统计信息模块(Statistics)、开销和响应时间分析模块(Cost and RT Analyzer)、查询任务调度模块(Query Task Scheduler)、长尾处理模块(Tail Tolerance Scheduler)、 系统状态监控模块(System Status Monitor)、以及,计算节点集群(Computer Node Cluster)。
其中:
元数据模块(SLA Metadata)用于存储所有业务相关的实时查询数据量(workload)中特定的查询模式的响应时间的SLA。在业务系统接入本实时计算系统时,会商定特定数据量情况下的特定查询模式的期望响应时间作为SLA元数据进行存储,提供给Query Pattern Analyzer模块进行特定查询的期望响应时间(RT SLA)的匹配查询。
查询模式分析模块(Query Pattern Analyzer)用于对收到的查询进行快速模式分析,其中主要查询模式包括本地数据过滤(local filtering)、索引查找(index filtering)、hash join、子查询、range查询、多值列查询等。在分析出查询模式后,可以从SLA Metadata读入业务对于该类查询模式的RT SLA,作为Query Task Scheduler模块的输入。同时,查询模式还作为Cost and RT Analyzer模块的输入,用于开销和RT估算。
统计信息模块(Statistics)用于存储描述业务系统数据情况的业务描述信息,业务描述信息包括:数据量cardinality、特定列值的frequency、直方图histogram等,业务描述信息可以定期或者实时进行更新,以反映当前业务数据情况。
开销和响应时间分析模块(Cost and RT Analyzer)用于分析开销信息和估计响应时间,其中,开销信息包括:系统开销信息和节点开销信息,Cost and RT Analyzer模块读入统计信息,以对查询进行开销如中央处理器(Central Processing Unit,CPU)开销、输入输出(Input/Output,IO)开销、网络开销等的分析和RT估算。
查询任务调度模块(Query Task Scheduler)用于对目标任务进行调度,即执行子任务的调度下发决策,可以读入查询模式、估算的查询开销和RT、系统运行状态信息,对查询进行子任务调度。其中,分布式实时计算系统的计算任务一般由众多可并行的子任务组成,因此,Query Task Scheduler模块还会同时协调已经实现的长尾处理模块(Tail Tolerance Scheduler),在适当的时候进行长尾处理。
长尾处理模块(Tail Tolerance Scheduler)用于对系统中的子任务进行长尾处理。
系统状态监控模块(System Status Monitor)用于实时收集处理节点的系统运行状态。
计算节点集群(Computer Node Cluster)用于对子任务进行查询处理,其中,计算节点集群包括多个处理节点。计算节点集群用于接收服务端发送的子任务请求,并返回计算结果。其中,分布式的计算节点集群由多台服务器组成,通常划分为多个副本。其中每个副本都包含完备的数据。每个副本又划分为多个分区,每个分区包含部分数据,分 区查询可以并行执行。其中,一台服务器(计算节点)可以存储一个或者多个分区数据。
基于上述实时计算系统可以在满足商定的期望响应时间SLA的基础上,对子任务进行合理调度,稳定实时计算系统的响应时间。具体步骤如下:
参照图3,示出了本申请的另一种任务分配方法实施例的步骤流程图,具体可以包括如下步骤:
步骤302,依据目标任务分析至少一种查询模式。
按照目标任务对数据库表进行查询,确定每个数据库表对应列的查询模式。例如对于用户数据库db1的查询语句如下:
SELECT tab1.col1,tab1.col2,tab2.col1,tab2.col2
FROM tab1join tab2on tab1.col3=tab2.col3
WHERE tab1.col1=‘value1’
AND tab2.col2=‘value2’
AND tab2.col4IN(‘value3’,‘value4’,‘value5’,‘value6’)
AND tab2.col5>100000
AND tab2.col6IN(SELECT tab3.col1FROM tab3WHERE tab3.col2=‘value7’)
基于上述语句对数据库db1进行查询后,可得到如下6个查询模式:
在表tab1的列col1上有本地数据过滤;
在表tab2的列col2上有本地数据过滤;
在表tab2的列col4上有多值列查询(col4为多值列);
在表tab2的列col5上有range查询;
在表tab2的col6上有子查询,子查询匹配单表查询;
在表tab3的列col2上有本地数据过滤。
从而通过相应的查询语句可以从数据库对应数据库表中分析相应的查询模式。
步骤304,获取所述查询模式的期望响应时间。
然后在SLA Metadata中读取每一种查询模式各自的期望响应时间,可以得到该目标任务总的期望响应时间。如上例从用户SLA Metadata中读取,数据库db1包含以上6种查询模式的SLA的响应延时RT为400ms。
步骤306,将目标任务中的子任务预下发给系统中的各处理节点。
本实施例中,为了防止计算节点由于处理过多的任务而导致任务延时受到影响,在 正式分配子任务之前,可以先将子任务预下发即预先分配给系统中的各处理节点,以便于计算每个处理节点对子任务的响应时间以及消耗等内容,从而确定是否分配子任务。
步骤308,读取各查询模式对应数据库表的业务描述信息。
步骤310,依据所述业务描述信息估算每个查询模式对应扫描的数据量。
步骤312,依据所述数据量估算开销信息和估计响应时间。
Cost and RT Analyzer从Statistics中读取各查询模式对应数据库表的业务描述信息,如数据量cardinality、特定列值的frequency、直方图histogram等。然后依据所述业务描述信息估算每个查询模式对应扫描的数据量,即需要扫描的数据条数,然后结合CPU单位和IO单位、以及网络开销单位,估算开销信息、RT值。其中,开销信息包括系统开销信息和节点开销信息,即通过上述过程估算出系统处理目标任务的系统开销信息、RT值,以及每个处理节点处理子任务所需的节点开销信息和RT值。
以估算系统开销信息为例,查询开销和RT估算可以采用如下公式:
查询开销=CPU开销+IO开销+网络开销
以上例用户的目标任务对应查询模式为例,包含以下6个具体的查询模式计算,并分别进行编号:
A.在表tab1的列col1上有本地数据过滤(=’value1’);
B.在表tab2的列col2上有本地数据过滤(=’value2’);
C.在表tab2的列col4上有多值列查询(col4为多值列)(4个值);
D.在表tab2的列col5上有range查询(>100000);
E.在表tab2的col6上有子查询,子查询匹配单表查询(如下tab3的查询);
F.在表tab3的列col2上有本地数据过滤(=’value7’)。
从Statistics中读取各查询模式对应数据库表的业务描述信息,假设,针对该查询模式的业务描述信息如下:
表tab1的列col1的cardinality为5;
表tab1的列col1的值“value1”的frequency为10/100000000;
表tab2的列col2的cardinality为8;
表tab2的列col2的值“value2”的frequency为1/100000000;
表tab2的列col4(多值列)的cardinality为1000000;
表tab2的列col5的cardinality为10000;
表tab2的列col5的值大于100000的占总记录数的20%;
表tab3的列col2的cardinality为10;
表tab3的列col2的值“value7”的frequency为1/1000;
根据以上业务描述信息,按照具体的A、B、C、D、E、F的查询模式,估算需要扫的数据条数,结合CPU、IO和网络开销单位,可得出总开销即系统开销信息和RT值。
步骤314,分别判断每个处理节点的节点开销信息是否低于节点阈值。
查询任务调度模块可以读入查询模式、估算的开销信息和RT值,并从System Status Monitor中读取系统运行状态信息,进行子任务调度。
本实施例在系统中配置的节点阈值,该节点阈值用于衡量处理节点对任务的处理能力,因此可以分别判断每个处理节点的节点开销信息是否低于节点阈值,从而确定能否为处理节点分配子任务。
若是,即处理节点的节点开销信息低于节点阈值,执行步骤316;若否,即处理节点的节点开销信息不低于节点阈值,执行步骤322。
步骤316,将预下发的子任务分配给所述处理节点。
处理节点的节点开销信息低于节点阈值,表征该处理节点具有实时处理子任务的能力,为该处理节点分配之前预下发的子任务,当然也可以分配其他子任务。由该处理节点对子任务进行处理。
步骤318,判断是否执行长尾处理。
将子任务分配给处理节点进行处理后,处理节点需要在规定时间内如期望响应时间内处理完该任务,该期望响应时间可以是该子任务的时间,也可以时同时下发的所子任务的总时间。若在规定时间内未处理完该子任务,则可以触发长尾处理,如重新下发子任务以重新处理。
若是,即执行长尾处理,执行步骤320;若否,即不执行长尾处理,执行步骤322。
步骤320,回收所述子任务。
在达到总处理时间后,若存在已分配的子任务未处理完毕,此时可以执行长尾处理,即回收所述子任务,重新下发处理。
其中,若存在长尾处理则将回收的子任务添加到剩余子任务中。本实施例中,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,包括:统计分配的子任务对应的总处理时间;按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间;按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点。
步骤322,估算剩余子任务的延时时间。
步骤324,计算延时时间。
步骤326,在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
本申请一个可选实施例中,所述按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间,包括:依据所述估计响应时间、系统开销信息估算所述剩余子任务的处理时间;依据所述期望响应时间和所述剩余子任务的处理时间计算所述剩余子任务的延时时间。
本申请另一个可选实施例中,按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点,包括:在分配的子任务处理完毕后,开始计算延时时间;在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
针对剩余子任务可以执行延时处理,即首先可以统计分配的子任务对应的总处理时间,例如依据每个子任务的期望响应时间估算总处理时间。然后按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间,即依据所述估计响应时间、系统开销信息估算出剩余子任务的处理时间,然后按照目标任务总的期望响应时间、剩余子任务的处理时间和分配的子任务对应的总处理时间,估算出剩余子任务的延时时间。
在分配的子任务处理完毕或者达到总处理时间后,开始计算延时时间,如启动一计时器开始计时,在达到延时时间时,为所述剩余子任务分配处理节点进行处理,例如可以将剩余子任务分配给之前预下发的处理节点进行处理,或者其他比较空闲的处理节点。
例如,假设上例查询模式对应商定的SLA为200ms。目标任务中子任务总共100个,经过开销估算,确定有60个处理节点的节点开销信息低于节点阈值,先下发60个子任务到相应的计算节点,计算总处理时间为50ms。然后评估这已分配的60个子任务所在计算节点系统运行状态、数据量和任务开销,与剩余40个子任务的目标节点系统运行状态、数据量和任务开销进行比较,预估40个子任务处理时间在100ms左右,其中;处理时间的耗时可以采用线性关系预估。
对剩余子任务进行延时控制,首先计算剩余40个子任务下发的延时时间=200-50-100-T_schedule_cost,假设,其中,T_schedule_cost是一个经验值,是在进行决策和调度算法时,算法本身需要一定的计算开销,是一个历史的经验统计,一般耗时很短如T_schedule_cost为1ms,则下发延时为49ms。
在下发延时可以从上述60个子任务完成并返回、或长尾处理回收子任务后,开始计 算统计延时时间,等待49ms后,再下发剩余未完成的计算子任务。从计算节点集群的角度而言,这49ms的延时时间能够有效的平衡和缓解实时计算系统资源利用。
如图4所示,对比模拟了在不对集群系统进行横向扩展的前提下,现有实时计算系统、仅采用长尾处理技术的系统和本申请实施例的集群随访问增长的系统响应时间对比曲线。
图4中,采用虚线“——”表示现有实时计算系统,随着系统并发访问量的增长,平均RT逐渐升高,到达系统瓶颈时,RT显著上升,最终导致系统卡死。
采用点划线“—·—”表示长尾处理系统,该系统在并发访问量的瓶颈阈值内,平均响应RT较为平稳,但是,系统瓶颈阈值并未改善,甚至由于重发策略,系统更早到达瓶颈阈值。
采用直线“—”本实施例的实时计算系统,虽然系统平均RT较高,但是在业务响应要求的SLA范围内,整体系统资源利用率较为平缓,随着访问的增长,对系统RT带来的影响较小,系统瓶颈阈值提高。
综上,本申请实施例提供一种新的业务访问SLA模式技术,能够在满足业务对于实时计算响应SLA的前提下,尽可能使系统实现实时计算请求稳定响应时延,提高系统瓶颈的阈值。主要包括两步:1)基于查询模式和业务描述信息的查询开销信息和响应时延估值;2)基于业务SLA的计算任务在线调度。
即首先对于不同查询模式,利用数据量和业务描述信息,以及历史类似查询模式的响应耗时等,对查询进行快速模式匹配分析,能估算大致的开销信息(如CPU开销、IO开销等)和响应时延,以便后续进行在线计算调度相关决策。
然后实时查询计算业务不同于离线数据仓库,其查询workload所包含的查询模式一般预先可知,响应时延要求在毫秒级(数ms到数百ms),或数十秒以内。上述估计的过程中,可较为准确的估算这些查询的响应和开销,从而能够与用户商定一个不同查询模式的响应时延SLA(同类查询,通常为用户申请的资源配置情况下的最长响应时延),系统再基于该SLA,进行灵活的在线计算任务调度,并且可以结合长尾处理技术,优化资源利用率,提高系统瓶颈阈值,以达到稳定响应时延的目的。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人 员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
实施例三
在上述实施例的基础上,本实施例还提供了一种任务分配(或者也可以称为调度)系统。
参照图5,示出了本申请一种任务分配系统实施例的结构框图,具体可以包括如下模块:
查询模式分析模块502,用于依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间。
消耗和响应时间分析模块504,用于依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息。
查询任务调度模块506,用于按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点;确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
本实施例中,上述任务分配(调度)系统可以看作是实时计算系统的一个子系统,查询模式分析模块502可以从元数据模块中获取查询模式的期望响应时间,任务调度系统中各模块与实时计算系统中其他模块的连接关系如图4及对应实施例,因此不再赘述。
综上所述,在采用实时计算系统对目标任务进行处理时,可以确定查询模式及其对应的期望响应时间,以期望响应时间为基准,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息,将目标任务中的子任务分配给选择的处理节点,确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,从而在期望响应时间内对目标任务中的子任务分批处理,在满足期望响应时间的基础上减少系统开销,防止系统由于开销等问题使得响应时间变长或卡死,稳定响应的延时时间。
参照图6,示出了本申请另一种任务调度系统实施例的结构框图,具体可以包括如下模块:
查询模式分析模块602,用于依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间。
消耗和响应时间分析模块604,用于依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息。
查询任务调度模块606,用于按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点;确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
长尾处理模块608,用于在达到总处理时间后,若存在已分配的子任务未处理完毕,则回收所述子任务;查询任务调度模块606,还用于将回收的子任务作为剩余子任务进行调度。
其中,所述消耗和响应时间分析模块604,包括:读取子模块60402,用于读取各查询模式对应数据库表的业务描述信息,其中,所述业务描述信息用于描述系统的业务数据;数据量估算子模块60404,用于依据所述业务描述信息估算每个查询模式对应扫描的数据量;开销和响应估算子模块60406,用于依据所述数据量估算系统开销信息和估计响应时间,其中,所述系统开销信息包括以下至少一项:处理器开销信息、输入输出开销信息和网络开销信息。
查询任务调度模块606,还用于将目标任务中的子任务预下发给系统中的各处理节点。
所述消耗和响应时间分析模块604,用于针对每个处理节点,依据所述子任务的查询模式对应数据库表确定业务描述信息;依据所述业务描述信息,估算所述处理节点处理所述子任务的节点开销信息。
所述查询任务调度模块606,包括:第一分配子模块60602、延时计算子模块60604和第二分配子模块60606。
第一分配子模块60602,用于分别判断每个处理节点的节点开销信息是否低于节点阈值;当所述处理节点的节点开销信息低于节点阈值时,将预下发的子任务分配给所述处理节点。
延时计算子模块60604,用于统计分配的子任务对应的总处理时间;按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间。
第二分配子模块60606,用于按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点。
所述延时计算子模块60604,用于依据所述估计响应时间、系统开销信息估算所述剩余子任务的处理时间;依据所述期望响应时间和所述剩余子任务的处理时间计算所述剩余子任务的延时时间。
第二分配子模块60606,用于在分配的子任务处理完毕后,开始计算延时时间;在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
18、根据权利要求10所述的系统,其特征在于,还包括:
本申请实施例提供一种新的业务访问SLA模式技术,能够在满足业务对于实时计算响应SLA的前提下,尽可能使系统实现实时计算请求稳定响应时延,提高系统瓶颈的阈值。主要包括两步:1)基于查询模式和业务描述信息的查询开销信息和响应时延估值;2)基于业务SLA的计算任务在线调度。
即首先对于不同查询模式,利用数据量和业务描述信息,以及历史类似查询模式的响应耗时等,对查询进行快速模式匹配分析,能估算大致的开销信息(如CPU开销、IO开销等)和响应时延,以便后续进行在线计算调度相关决策。
然后实时查询计算业务不同于离线数据仓库,其查询workload所包含的查询模式一般预先可知,响应时延要求在毫秒级(数ms到数百ms),或数十秒以内。上述估计的过程中,可较为准确的估算这些查询的响应和开销,从而能够与用户商定一个不同查询模式的响应时延SLA(同类查询,通常为用户申请的资源配置情况下的最长响应时延),系统再基于该SLA,进行灵活的在线计算任务调度,并且可以结合长尾处理技术,优化资源利用率,提高系统瓶颈阈值,以达到稳定响应时延的目的。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动 媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者 终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种任务分配(调度)方法和一种任务分配(调度)系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (18)

  1. 一种任务分配方法,其特征在于,包括:
    依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间;
    依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息;
    按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点;
    确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
  2. 根据权利要求1所述的方法,其特征在于,依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,包括:
    读取各查询模式对应数据库表的业务描述信息,其中,所述业务描述信息用于描述系统的业务数据;
    依据所述业务描述信息估算每个查询模式对应扫描的数据量;
    依据所述数据量估算系统开销信息和估计响应时间,其中,所述系统开销信息包括以下至少一项:处理器开销信息、输入输出开销信息和网络开销信息。
  3. 根据权利要求1所述的方法,其特征在于,所述估算系统每个处理节点处理的节点开销信息之前,还包括:
    将目标任务中的子任务预下发给系统中的各处理节点。
  4. 根据权利要求3所述的方法,其特征在于,所述估算系统每个处理节点处理的节点开销信息,包括:
    针对每个处理节点,依据所述子任务的查询模式对应数据库表确定业务描述信息;
    依据所述业务描述信息,估算所述处理节点处理所述子任务的节点开销信息。
  5. 根据权利要求1或4所述的方法,其特征在于,按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点,包括:
    分别判断每个处理节点的节点开销信息是否低于节点阈值;
    当所述处理节点的节点开销信息低于节点阈值时,将预下发的子任务分配给所述处理节点。
  6. 根据权利要求4所述的方法,其特征在于,所述依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度,包括:
    统计分配的子任务对应的总处理时间;
    按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间;
    按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点。
  7. 根据权利要求6所述的方法,其特征在于,所述按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间,包括:
    依据所述估计响应时间、系统开销信息估算所述剩余子任务的处理时间;
    依据所述期望响应时间和所述剩余子任务的处理时间计算所述剩余子任务的延时时间。
  8. 根据权利要求6所述的方法,其特征在于,按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点,包括:
    在分配的子任务处理完毕后,开始计算延时时间;
    在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
  9. 根据权利要求1所述的方法,其特征在于,还包括:
    在达到总处理时间后,若存在已分配的子任务未处理完毕,则回收所述子任务;
    将回收的子任务作为剩余子任务进行调度。
  10. 一种任务分配系统,其特征在于,包括:
    查询模式分析模块,用于依据目标任务分析至少一种查询模式,获取所述查询模式的期望响应时间;
    消耗和响应时间分析模块,用于依据所述查询模式和业务描述信息,估算系统开销信息和估计响应时间,以及估算系统每个处理节点处理的节点开销信息;
    查询任务调度模块,用于按照所述节点开销信息选择处理节点,将目标任务中的子任务分配给选择的处理节点;确定目标任务未分配的剩余子任务,依据期望响应时间、系统开销信息和估计响应时间,对所述剩余子任务进行调度。
  11. 根据权利要求10所述的系统,其特征在于,所述消耗和响应时间分析模块,包括:
    读取子模块,用于读取各查询模式对应数据库表的业务描述信息,其中,所述业务描述信息用于描述系统的业务数据;
    数据量估算子模块,用于依据所述业务描述信息估算每个查询模式对应扫描的数据量;
    开销和响应估算子模块,用于依据所述数据量估算系统开销信息和估计响应时间,其中,所述系统开销信息包括以下至少一项:处理器开销信息、输入输出开销信息和网络开销信息。
  12. 根据权利要求10所述的系统,其特征在于,
    查询任务调度模块,还用于将目标任务中的子任务预下发给系统中的各处理节点。
  13. 根据权利要求12所述的系统,其特征在于,
    所述消耗和响应时间分析模块,用于针对每个处理节点,依据所述子任务的查询模式对应数据库表确定业务描述信息;依据所述业务描述信息,估算所述处理节点处理所述子任务的节点开销信息。
  14. 根据权利要求10或13所述的系统,其特征在于,所述查询任务调度模块,包括:
    第一分配子模块,用于分别判断每个处理节点的节点开销信息是否低于节点阈值;当所述处理节点的节点开销信息低于节点阈值时,将预下发的子任务分配给所述处理节点。
  15. 根据权利要求13所述的系统,其特征在于,所述查询任务调度模块,包括:
    延时计算子模块,用于统计分配的子任务对应的总处理时间;按照所述期望响应时间、系统开销信息、估计响应时间和总处理时间,估算所述剩余子任务的延时时间;
    第二分配子模块,用于按照所述延时时间对所述剩余子任务进行延时后,为剩余子任务分配处理节点。
  16. 根据权利要求15所述的系统,其特征在于,
    所述延时计算子模块,用于依据所述估计响应时间、系统开销信息估算所述剩余子任务的处理时间;依据所述期望响应时间和所述剩余子任务的处理时间计算所述剩余子任务的延时时间。
  17. 根据权利要求15所述的系统,其特征在于,
    第二分配子模块,用于在分配的子任务处理完毕后,开始计算延时时间;在达到延时时间时,为所述剩余子任务分配处理节点进行处理。
  18. 根据权利要求10所述的系统,其特征在于,还包括:
    长尾处理模块,用于在达到总处理时间后,若存在已分配的子任务未处理完毕,则回收所述子任务;
    查询任务调度模块,还用于将回收的子任务作为剩余子任务进行调度。
PCT/CN2016/098187 2015-09-15 2016-09-06 一种任务分配方法和系统 WO2017045553A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/922,805 US10936364B2 (en) 2015-09-15 2018-03-15 Task allocation method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510587698.6A CN106528280B (zh) 2015-09-15 2015-09-15 一种任务分配方法和系统
CN201510587698.6 2015-09-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/922,805 Continuation US10936364B2 (en) 2015-09-15 2018-03-15 Task allocation method and system

Publications (1)

Publication Number Publication Date
WO2017045553A1 true WO2017045553A1 (zh) 2017-03-23

Family

ID=58288104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/098187 WO2017045553A1 (zh) 2015-09-15 2016-09-06 一种任务分配方法和系统

Country Status (3)

Country Link
US (1) US10936364B2 (zh)
CN (1) CN106528280B (zh)
WO (1) WO2017045553A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109995824A (zh) * 2017-12-29 2019-07-09 阿里巴巴集团控股有限公司 一种对等网络中的任务调度方法及装置
US10730985B2 (en) 2016-12-19 2020-08-04 Bridgestone Corporation Functionalized polymer, process for preparing and rubber compositions containing the functionalized polymer
CN111522641A (zh) * 2020-04-21 2020-08-11 北京嘀嘀无限科技发展有限公司 任务调度方法、装置、计算机设备和存储介质
US10936364B2 (en) 2015-09-15 2021-03-02 Alibaba Group Holding Limited Task allocation method and system
CN113450002A (zh) * 2021-07-01 2021-09-28 京东科技控股股份有限公司 任务的分配方法、装置、电子设备及存储介质
CN115391018A (zh) * 2022-09-30 2022-11-25 中国建设银行股份有限公司 任务调度方法、装置和设备

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107294774B (zh) * 2017-06-08 2020-07-10 深圳市迈岭信息技术有限公司 分布式系统物理节点的任务部署方法
CN107547270A (zh) * 2017-08-14 2018-01-05 天脉聚源(北京)科技有限公司 一种智能分配任务分片的方法及装置
CN108595254B (zh) * 2018-03-09 2022-02-22 北京永洪商智科技有限公司 一种查询调度方法
CN110489224A (zh) * 2018-05-15 2019-11-22 北京京东尚科信息技术有限公司 一种任务调度的方法和装置
CN108804378A (zh) * 2018-05-29 2018-11-13 郑州易通众联电子科技有限公司 一种计算机数据处理方法及系统
CN109582696B (zh) * 2018-10-09 2023-07-04 北京奥星贝斯科技有限公司 扫描任务的生成方法及装置、电子设备
CN109245949B (zh) * 2018-10-31 2022-03-01 新华三技术有限公司 一种信息处理方法及装置
CN109783224B (zh) * 2018-12-10 2022-10-14 平安科技(深圳)有限公司 基于负载调配的任务分配方法、装置及终端设备
CN111381956B (zh) * 2018-12-28 2024-02-27 杭州海康威视数字技术股份有限公司 一种任务处理的方法、装置及云分析系统
US10944644B2 (en) * 2019-04-30 2021-03-09 Intel Corporation Technologies for thermal and power awareness and management in a multi-edge cloud networking environment
CN111290847B (zh) * 2020-03-03 2023-07-07 舟谱数据技术南京有限公司 去中心化的分布式多节点协同任务调度平台及方法
CN113298332A (zh) * 2020-04-17 2021-08-24 阿里巴巴集团控股有限公司 信息处理方法、装置及电子设备
CN111813513B (zh) * 2020-06-24 2024-05-14 中国平安人寿保险股份有限公司 基于分布式的实时任务调度方法、装置、设备及介质
CN113760441A (zh) * 2020-08-31 2021-12-07 北京京东尚科信息技术有限公司 容器创建方法、装置、电子设备及存储介质
CN112148575A (zh) * 2020-09-22 2020-12-29 京东数字科技控股股份有限公司 一种信息处理方法、装置、电子设备及存储介质
CN112486658A (zh) * 2020-12-17 2021-03-12 华控清交信息科技(北京)有限公司 一种任务调度方法、装置和用于任务调度的装置
CN112596879B (zh) * 2020-12-24 2023-06-16 中国信息通信研究院 用于量子云计算平台任务调度的方法
CN112817724A (zh) * 2021-02-05 2021-05-18 苏州互方得信息科技有限公司 一种可动态编排顺序的任务分配方法
CN114339419B (zh) * 2021-12-29 2024-04-02 青岛海信移动通信技术有限公司 一种视频流拉流处理的方法、装置及存储介质
CN115665805A (zh) * 2022-12-06 2023-01-31 北京交通大学 一种面向点云分析任务的边缘计算资源调度方法和系统
CN115955481A (zh) * 2022-12-12 2023-04-11 支付宝(杭州)信息技术有限公司 应急响应方法和装置
CN117785488B (zh) * 2024-02-27 2024-04-26 矩阵起源(深圳)信息科技有限公司 查询调度方法、装置、设备及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120324111A1 (en) * 2011-06-16 2012-12-20 Ron Barzel Task allocation in a computer network
CN102902344A (zh) * 2011-12-23 2013-01-30 同济大学 基于随机任务的云计算系统能耗优化方法
CN103713949A (zh) * 2012-10-09 2014-04-09 鸿富锦精密工业(深圳)有限公司 动态任务分配系统及方法
CN103841208A (zh) * 2014-03-18 2014-06-04 北京工业大学 基于响应时间最优化的云计算任务调度方法
CN104852819A (zh) * 2015-05-21 2015-08-19 杭州天宽科技有限公司 一种基于异构MapReduce集群的面向SLA的能耗管理方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026391A (en) * 1997-10-31 2000-02-15 Oracle Corporation Systems and methods for estimating query response times in a computer system
CN101739451A (zh) * 2009-12-03 2010-06-16 南京航空航天大学 一种网格数据库连接查询自适应处理方法
CN103177120B (zh) * 2013-04-12 2016-03-30 同方知网(北京)技术有限公司 一种基于索引的XPath查询模式树匹配方法
CN103235835B (zh) * 2013-05-22 2017-03-29 曙光信息产业(北京)有限公司 用于数据库集群的查询实现方法和装置
CN104424287B (zh) * 2013-08-30 2019-06-07 深圳市腾讯计算机系统有限公司 数据查询方法和装置
CN106528280B (zh) 2015-09-15 2019-10-29 阿里巴巴集团控股有限公司 一种任务分配方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120324111A1 (en) * 2011-06-16 2012-12-20 Ron Barzel Task allocation in a computer network
CN102902344A (zh) * 2011-12-23 2013-01-30 同济大学 基于随机任务的云计算系统能耗优化方法
CN103713949A (zh) * 2012-10-09 2014-04-09 鸿富锦精密工业(深圳)有限公司 动态任务分配系统及方法
CN103841208A (zh) * 2014-03-18 2014-06-04 北京工业大学 基于响应时间最优化的云计算任务调度方法
CN104852819A (zh) * 2015-05-21 2015-08-19 杭州天宽科技有限公司 一种基于异构MapReduce集群的面向SLA的能耗管理方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936364B2 (en) 2015-09-15 2021-03-02 Alibaba Group Holding Limited Task allocation method and system
US10730985B2 (en) 2016-12-19 2020-08-04 Bridgestone Corporation Functionalized polymer, process for preparing and rubber compositions containing the functionalized polymer
CN109995824A (zh) * 2017-12-29 2019-07-09 阿里巴巴集团控股有限公司 一种对等网络中的任务调度方法及装置
CN111522641A (zh) * 2020-04-21 2020-08-11 北京嘀嘀无限科技发展有限公司 任务调度方法、装置、计算机设备和存储介质
CN111522641B (zh) * 2020-04-21 2023-11-14 北京嘀嘀无限科技发展有限公司 任务调度方法、装置、计算机设备和存储介质
CN113450002A (zh) * 2021-07-01 2021-09-28 京东科技控股股份有限公司 任务的分配方法、装置、电子设备及存储介质
CN115391018A (zh) * 2022-09-30 2022-11-25 中国建设银行股份有限公司 任务调度方法、装置和设备

Also Published As

Publication number Publication date
US10936364B2 (en) 2021-03-02
CN106528280A (zh) 2017-03-22
US20180276031A1 (en) 2018-09-27
CN106528280B (zh) 2019-10-29

Similar Documents

Publication Publication Date Title
WO2017045553A1 (zh) 一种任务分配方法和系统
CN107580023B (zh) 一种动态调整任务分配的流处理作业调度方法及系统
US9916183B2 (en) Scheduling mapreduce jobs in a cluster of dynamically available servers
US8510741B2 (en) Computing the processor desires of jobs in an adaptively parallel scheduling environment
US8799916B2 (en) Determining an allocation of resources for a job
WO2017166803A1 (zh) 一种资源调度方法及装置
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
US20140019987A1 (en) Scheduling map and reduce tasks for jobs execution according to performance goals
US20130290972A1 (en) Workload manager for mapreduce environments
Chard et al. Cost-aware cloud provisioning
CN109564528B (zh) 分布式计算中计算资源分配的系统和方法
US20140026147A1 (en) Varying a characteristic of a job profile relating to map and reduce tasks according to a data size
US20130318538A1 (en) Estimating a performance characteristic of a job using a performance model
CN106874100B (zh) 计算资源分配方法及装置
US10712945B2 (en) Deduplication processing method, and storage device
CN108595254B (zh) 一种查询调度方法
JP2020531967A (ja) 分散システム資源配分の方法、装置、及びシステム
EP3537281B1 (en) Storage controller and io request processing method
CN110347515B (zh) 一种适合边缘计算环境的资源优化分配方法
WO2016041446A1 (zh) 一种资源分配方法、装置及设备
Hu et al. Job scheduling without prior information in big data processing systems
US10643193B2 (en) Dynamic workload capping
CN108446169B (zh) 一种作业调度方法及装置
Kambatla et al. UBIS: Utilization-aware cluster scheduling
CN112988363B (zh) 资源调度方法、装置、服务器和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16845667

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16845667

Country of ref document: EP

Kind code of ref document: A1