CN113850428A - Job scheduling prediction processing method and device and electronic equipment - Google Patents

Job scheduling prediction processing method and device and electronic equipment Download PDF

Info

Publication number
CN113850428A
CN113850428A CN202111123438.5A CN202111123438A CN113850428A CN 113850428 A CN113850428 A CN 113850428A CN 202111123438 A CN202111123438 A CN 202111123438A CN 113850428 A CN113850428 A CN 113850428A
Authority
CN
China
Prior art keywords
job
executed
prediction
plan
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111123438.5A
Other languages
Chinese (zh)
Inventor
陆明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202111123438.5A priority Critical patent/CN113850428A/en
Publication of CN113850428A publication Critical patent/CN113850428A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a prediction processing method, a device and an electronic device for job scheduling, after obtaining job plan information of submitted job to be executed, at least determining a prediction plan of the job to be executed according to the submitted job plan information, executing the prediction processing of the job to be executed according to the prediction plan, predicting resource index values and/or resource states of a system under different assumed job activities of the job to be executed which are supposed to be executed in different time periods in the future respectively to obtain a prediction result, generating an available execution time window set of the job to be executed according to the prediction result, wherein each available execution time window in the set can be used for enabling the resource state of system resources to meet state conditions in the process of the job activity of the job to be executed, thereby finally realizing planning a proper job time window for the job to be executed through the available execution time window set, and reducing resource competition for other services in the execution process of the job to be executed.

Description

Job scheduling prediction processing method and device and electronic equipment
Technical Field
The present application relates to the field of job scheduling and resource management, and in particular, to a method and an apparatus for performing prediction processing on job scheduling, and an electronic device.
Background
In a cloud computing environment, job activities, which share infrastructure-to-platform resources with other cloud computing jobs, may cause jobs that are already heavily loaded to run slower, even resulting in job timeouts or failures. For example, the Hadoop platform generates a large number of disk reads and writes even reaching a load threshold in the multi-job execution process, which correspondingly causes great influence on platform storage and application experience. Therefore, it is necessary to make a job scheduling plan in advance to reduce resource competition of jobs for other services by scheduling jobs.
Disclosure of Invention
Therefore, the application discloses the following technical scheme:
a method of predictive processing of job scheduling, the method comprising:
acquiring submitted operation plan information of the operation to be executed;
determining a prediction plan at least according to the operation plan information;
according to the prediction plan, performing prediction processing on the operation to be executed to obtain a prediction result; the prediction result comprises: resource index values and/or resource states of the system under different assumed job activities assumed to be performed on the jobs to be performed for different time periods in the future, respectively;
generating an available execution time window set of the job to be executed according to the prediction result so as to schedule the job to be executed based on the available execution time window set; each available execution time window in the set of available execution time windows can be used for enabling the resource state of the system resource to meet a state condition in the execution process of the job to be executed.
Optionally, the determining a prediction plan according to at least the operation plan information includes:
acquiring operation data of historical operation corresponding to the operation to be executed and monitoring data obtained by monitoring system resource indexes and/or resource states in the historical operation executing process;
and determining a prediction plan according to the operation plan information and the operation data and the monitoring data of the historical operation.
Optionally, the job planning information at least includes execution time planning information and node deployment planning information of the job to be executed, where the node deployment planning information is used to indicate related planning information when the job to be executed is deployed to a corresponding node;
determining a prediction plan according to the operation plan information and the operation data and monitoring data of the historical operation comprises the following steps:
determining each deployment position of the operation to be executed and a first resource dependency relationship between each deployment position and a dependent resource according to the node deployment plan information;
at least determining a second resource dependency relationship between the job to be executed and the resource depended on, which are deployed at each position, according to the first resource dependency relationship between each deployment position and the resource depended on;
and determining a prediction plan according to the first resource dependency relationship, the second resource dependency relationship, and the operation data and the monitoring data of the historical operation.
Optionally, the prediction plan includes: a medium-term prediction plan and a short-term prediction plan when performing prediction processing on the job to be executed;
wherein each of the medium term prediction plan and the short term prediction plan comprises: the corresponding prediction time range is divided into a plurality of time slices according to a preset step length, and a prediction algorithm corresponding to each time slice and parameters required by the prediction algorithm are obtained; the prediction time range corresponding to the medium-term prediction plan contains and is larger than the prediction time range corresponding to the corresponding short-term prediction plan;
the executing the prediction processing on the job to be executed according to the prediction plan to obtain a prediction result includes:
according to a prediction algorithm and parameters in the short-term prediction plan, predicting a resource index value and/or a resource state of a system when the operation to be executed is executed in each time slice corresponding to the short-term prediction plan to obtain a first prediction result;
and predicting the resource index value and/or the resource state of the system when the operation to be executed is executed in each time slice corresponding to the medium-term prediction plan according to the prediction algorithm and the parameters in the medium-term prediction plan to obtain a second prediction result.
Optionally, the generating, according to the prediction result, an available execution time window set of the job to be executed includes:
determining whether the time point corresponding to each time slice is an available execution time point or not according to the first prediction result and the second prediction result; the available execution time point is a time point which can be used for enabling the resource state of the system resource to meet the state condition when the operation to be executed is executed;
merging and processing the time points of the same assumed job activity into a job time window;
determining whether all time points in each operation time window are available execution time points;
if so, determining the operation time window as an available execution time window; if not, determining the operation time window as a non-available execution time window, or determining whether the operation time window is an available execution time window based on the confirmation processing of availability of the operation time window;
wherein each determined available execution time window constitutes the set of available execution time windows.
Optionally, the determining, according to the first prediction result and the second prediction result, whether a time point corresponding to each time slice is an available execution time point includes:
converging the first prediction result and the second prediction result to obtain a resource index value and/or a result set of resource state prediction results of a time point corresponding to each time slice;
and determining whether the time point corresponding to each time slice is an available execution time point or not based on a preset strategy according to the corresponding result set.
Optionally, the method further includes:
for a job time window containing all the time points as available execution time points, determining whether the job time window overlaps with an input time interval which cannot be used for executing the job;
if the operation time window is overlapped, determining the operation time window as a non-available execution time window;
and if the overlap does not exist, determining the operation time window as an available execution time window.
Optionally, after generating the set of available execution time windows of the job to be executed, the method further includes:
if the available execution time window set is empty, performing dimensionality reduction on the operation parameters of the operation to be executed, and determining a matched available execution time window set according to the operation to be executed after the operation parameters are subjected to dimensionality reduction;
the parameter combination of the to-be-executed operation obtained after the dimension reduction processing can be used for enabling the operation result of the to-be-executed operation to meet a quality condition.
A prediction processing apparatus of job scheduling, the apparatus comprising:
the acquisition module is used for acquiring the submitted operation plan information of the operation to be executed;
a determination module for determining a prediction plan based at least on the job plan information;
the prediction processing module is used for executing prediction processing on the operation to be executed according to the prediction plan to obtain a prediction result; the prediction result comprises: resource index values and/or resource states of the system under different assumed job activities assumed to be performed on the jobs to be performed for different time periods in the future, respectively;
a generating module, configured to generate an available execution time window set of the to-be-executed job according to the prediction result, so as to schedule the to-be-executed job based on the available execution time window set; each available execution time window in the set of available execution time windows can be used for enabling the resource state of the system resource to meet a state condition in the execution process of the job to be executed.
An electronic device, comprising:
a memory for storing a set of computer instructions;
a processor for implementing a method of predictive processing of job scheduling as claimed in any one of the preceding claims by executing a set of instructions stored on a memory.
According to the above solution, after obtaining the job plan information of the submitted job to be executed, the prediction processing method, apparatus and electronic device for job scheduling disclosed in the present application determine the prediction plan of the job to be executed according to at least the submitted job plan information, execute the prediction processing according to the prediction plan, predict the resource index value and/or resource state of the system under different assumed job activities of the job to be executed which are assumed to be executed in different time periods in the future respectively, obtain the prediction result, and generate the available execution time window set of the job to be executed according to the prediction result, where each available execution time window in the set can be used to make the resource state of the system resource meet the state condition during the job activity of the job to be executed, so that the appropriate job time window can be planned for the job to be executed through the available execution time window set finally, and reducing resource competition for other services in the execution process of the job to be executed.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for predictive processing of job scheduling provided by the present application;
FIG. 2 is a logic diagram of a process of the method of the present application provided herein;
FIG. 3 is a process diagram provided herein for determining a prediction plan for a job to be executed;
FIG. 4 is a process diagram of generating a set of available execution time windows for jobs to be executed as provided herein;
FIG. 5 is a schematic flow chart diagram illustrating another method for predictive processing of job scheduling provided herein;
FIG. 6 is another processing logic diagram of the method of the present application provided herein;
FIG. 7 is a schematic diagram of a prediction processing apparatus for job scheduling according to the present application;
fig. 8 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In a cloud computing environment, job activities share resources from infrastructure to a platform level with other cloud computing jobs, the job activities may cause jobs with large loads to run slower, even cause job overtime or failure, and especially, many job activities in a cluster environment are difficult to stop quickly once started.
The prediction processing method for job scheduling disclosed by the application can be applied to a plurality of general or special computing device environments or equipment under configuration, such as: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor apparatus, distributed computing service platforms (e.g., cloud computing platforms) that include such devices or apparatus, and the like.
The processing flow of the prediction processing method for job scheduling provided by the embodiment of the present application is shown in fig. 1, and specifically includes:
step 101, obtaining the submitted job plan information of the job to be executed.
In a job execution environment such as a cloud computing platform or a container platform, referring to fig. 2, an engineer or a Web API (Application Programming Interface) or other program Interface may submit job plan information of a job to be executed to a job execution device such as a platform server deployed in the environment according to a requirement. The job execution device correspondingly obtains the job plan information of the submitted job to be executed.
The job to be executed may be, but is not limited to, a model training job, an image processing job, a data storage job, and the like, and the job plan information of the job to be executed includes at least execution time plan information and node deployment plan information of the job to be executed. The execution time planning information includes a job to be executed for a certain job identifier (such as a job number), a job type (such as model training), a planned job start time, a planned job duration, an expected job end time, and other parameters. The node deployment plan information is used to indicate related plan information when the job to be executed is deployed to the corresponding node, and specifically includes, but is not limited to, which server nodes the job to be executed is deployed to, which virtual machines of the node are specifically deployed for the deployed server nodes.
In addition to the above information, the job plan information of the job to be executed may further include any one or more of concurrency amount, sensitive resource/sensitive index range, load pressure evaluation (such as CPU occupation amount and storage IO pressure) which may be generated by performing the job inside the node, and the like. The concurrency of the job to be executed refers to the preset concurrency of the job to be executed on each host, such as a server node, and the sensitive resource/sensitive index of the job to be executed refers to a resource or index greatly affected by the job activity of the job to be executed.
And 102, determining a prediction plan at least according to the operation plan information of the operation to be executed.
After the submitted job plan information of the job to be executed is obtained, a prediction plan is determined at least according to the job plan information of the job to be executed.
In this embodiment, in addition to the job plan information, job data of a historical job corresponding to the job to be executed and monitoring data obtained by monitoring a system resource index and/or a resource state during execution of the historical job of the job to be executed are also obtained as references of the current job to be executed, for example, regarding a stored job, different jobs are executed under different Block sizes, and some job parameters may be referred to or directly multiplexed. On the basis, a prediction plan is determined for the job to be executed according to the job plan information of the job to be executed and the job data and the monitoring data of the historical job corresponding to the job to be executed.
The historical job corresponding to the job to be executed refers to a job which is executed in the history and is the same as or similar to the job to be executed (for example, the job is executed in the history after being completed in the history, or the job is not completed but upgraded, and the job activity before upgrading is taken as the history).
The job data of the historical job corresponding to the job to be executed includes, but is not limited to, execution time information (start time, running duration, end time) of the historical job corresponding to the job to be executed, concurrency, node deployment information, which other jobs are run on the deployed node/virtual machine, and the like, so as to know when and where to execute the historical jobs which are the same as or similar to the job to be executed, and which other jobs are executed in the execution process; the monitoring data of the historical job corresponding to the job to be executed includes, but is not limited to, monitoring values of system resource indexes and/or resource states in the execution environment, such as CPU, memory, network resource states, and the like, obtained based on the monitoring database or the job history. If the job to be executed has not been historically executed, the information may be empirically tagged as a reference in making a prediction plan for the job to be executed.
Referring to fig. 3, the process of determining a prediction plan for the job to be executed according to the job plan information of the job to be executed and the job data and the monitoring data of the historical job corresponding to the job to be executed may be further implemented as:
step 301, determining each deployment position of the job to be executed and a first resource dependency relationship between each deployment position and the dependent resource according to the node deployment plan information of the job to be executed.
Various deployment locations for jobs to be performed include, but are not limited to: each server node (host) to which the job to be executed needs to be deployed and the corresponding virtual machine which needs to be deployed under the corresponding server node are indicated by the node deployment plan information in the job plan information.
Correspondingly, in this step, a server node (host) to which the job to be executed needs to be deployed and a corresponding virtual machine to which the job needs to be deployed under the corresponding server node are determined specifically according to the node deployment plan information of the job to be executed, and the determined information is used as each deployment position of the job to be executed.
And then, further determining a resource dependency relationship between each deployment position of the operation to be executed and the dependent resource based on corresponding relationships between different nodes, different virtual machines on the nodes and the allocated resource in the operation execution environment such as the cloud computing platform or the container platform, and the like, so as to obtain a first resource dependency relationship.
In implementation, the relevant cloud platform information may be obtained through the cloud platform database or API according to the node deployment plan information of the job to be executed, including but not limited to how many hosts (server nodes) of the job to be executed, which virtual machines run on the hosts, host size/virtual machine size, which resources different virtual machines are connected to in what manner, and determining the resource dependency relationship between each deployment position of the job to be executed and the dependent resource by combining the information, such as the dependencies among the various server nodes, virtual machines and resources used to deploy the jobs to be performed, wherein, for the same type of resource, different server nodes/virtual machines may rely on the same resource or different resources of the same type of resource, such as different server nodes/virtual machines relying on the same storage resource or different storage resources.
Step 302, according to the first resource dependency relationship between each deployment position and the dependent resource, at least determining a second resource dependency relationship between the job to be executed deployed at each position and the dependent resource.
After the first resource dependency relationship between each deployment position and the dependent resource is obtained, the second resource dependency relationship between the job to be executed and the dependent resource deployed at each position is deduced according to the first resource dependency relationship between each deployment position and the dependent resource and by combining the resource types, such as storage resources, computing resources, network resources and the like, on which the operation of the job to be executed needs to depend.
And step 303, determining a prediction plan for the job to be executed according to the first resource dependency relationship, the second resource dependency relationship, and the job data and the monitoring data of the historical job of the job to be executed.
And then, the first resource dependency relationship, the second resource dependency relationship, the operation data of the historical operation of the operation to be executed and the monitoring data are used as data bases, and the cloud platform load prediction is planned by combining the resources, the key indexes, the test rules, the test duration, the plan completion time and the like embodied in the information, so that a corresponding prediction plan is made and generated for the operation to be executed.
In this embodiment, it is preferable that the prediction plans generated for the job to be executed include a medium-term prediction plan and a short-term prediction plan when the prediction processing is executed for the job to be executed.
Wherein each of the medium-term prediction plan and the short-term prediction plan includes: the corresponding prediction time range is divided into a plurality of time slices according to the preset step length, and a prediction algorithm corresponding to each time slice and parameters required by the prediction algorithm are obtained.
The prediction time range corresponding to the medium-term prediction plan is contained in and larger than the prediction time range corresponding to the corresponding short-term prediction plan. In the embodiment of the application, the medium-term prediction plan and the short-term prediction plan are relative, and for a specific prediction plan A of the operation to be executed, if the prediction time range of the specific prediction plan A is smaller than that of the prediction plan B of the operation to be executed, A is the short-term prediction plan, and B is the medium-term prediction plan; if the prediction time range corresponding to the A is larger than that of the prediction plan C of the operation to be executed, the A is a medium-term prediction plan, and the C is a short-term prediction plan; thus, in the present embodiment, the medium-term and short-term prediction plans cover a series (e.g., 20) of prediction plans corresponding to different time ranges and time slice lengths, respectively, and do not represent two specific prediction plans.
Specifically, for example, assuming that the total execution time of a job is 2 hours in a prediction plan of the job, prediction plans are made for the job corresponding to time ranges of the first 20 minutes, the first 40 minutes, the first 1 hour 20 minutes, the first 1 hour 40 minutes, and the 2 hours of the job, respectively, the time ranges of the prediction plans from the second prediction plan to the respective prediction plans include the time ranges of the previous prediction plans, and one prediction plan among the prediction plans is a medium-term prediction plan and a short-term prediction plan with respect to the subsequent prediction plan.
In addition, for each medium-term prediction plan or short-term prediction plan, the embodiment divides the medium-term prediction plan or the short-term prediction plan into a plurality of segments with a certain step length (time length), each segment is regarded as a time slice of the prediction plan, and each segment is used for comparing and analyzing prediction data in the following process to obtain a corresponding prediction result.
The step sizes according to which the segments are divided may be the same or different for different short-term/medium-term prediction plans, and are not limited herein.
For different divided time slices and different prediction time ranges, a plurality of short-term/medium-term prediction plans are generated in batches, each prediction plan comprises a corresponding prediction time range, a series of time slices obtained by dividing the prediction time range, a prediction algorithm (function) which needs to be used and is distributed in the corresponding prediction time range, and parameters required by the prediction algorithm, for example, if the operation is in deep learning activity, the operation parameters such as model characteristics, iteration times, a used target optimization function, and what network structure is used can be used as the parameters required by the prediction algorithm.
The prediction plan of the job to be executed includes an overall prediction time range covered by a medium-term and short-term prediction plan corresponding to the same job activity, and is related to a total execution time range of the job to be executed indicated by the time plan information of the job to be executed, for example, a plurality of time periods (including the total execution time range of the plan, or including only a part thereof and performing time extension before or after it, as appropriate) related to the total execution time range of the plan for the job to be executed are determined for the job to be executed by combining system resources, key indexes, test scales and test durations around the total execution time range of the plan (such as two hours of 13:00-15:00 in the afternoon of a certain day), and each time period is divided into a medium/short-term time range and a segment.
In the present embodiment, the prediction processing of the job to be executed is realized by assuming that the job to be executed is executed in each of the plurality of time periods, where an assumed execution process of the job to be executed in each time period may be regarded as a same assumed job activity of the job to be executed.
103, according to the determined prediction plan, performing prediction processing on the job to be executed to obtain a prediction result; the predicted result includes: resource indicator values and/or resource states of the system under different assumed job activities assuming that jobs to be executed are executed for different time periods in the future, respectively.
On the basis of determining the prediction plan for the job to be executed, the relevant short-term prediction and medium-term prediction are continuously executed according to different prediction plans, and with reference to fig. 2, the process may be further implemented as:
1) according to a prediction algorithm and parameters in the short-term prediction plan, predicting resource index values and/or resource states of a system when the operation to be executed is executed in each time slice corresponding to the short-term prediction plan to obtain a first prediction result;
2) and predicting the resource index value and/or the resource state of the system when the operation to be executed is executed in each time slice corresponding to the medium-term prediction plan according to the prediction algorithm and the parameters in the medium-term prediction plan, and obtaining a second prediction result.
That is to say, specifically, under the condition that a job to be executed is supposed to be executed in a certain time period in the future, the prediction algorithm and the parameter required by the algorithm corresponding to the short-term/medium-term prediction plan included in the time period are used to predict the job activity of the job to be executed in each time slice (for example, the time point corresponding to the time slice) in the time range of the short-term/medium-term prediction plan, the resource state (for example, overload and non-overload) and/or the specific resource index value (for example, CPU occupation amount, IO throughput of storage, etc.) of the resources of the system, for example, CPU, memory, network, storage, etc.
The respective short-term prediction and medium-term prediction processes may be executed in series or in parallel, and since there are many prediction and estimation processes to be executed, it is preferable that the respective short-term/medium-term prediction processes are executed in parallel.
104, generating an available execution time window set of the job to be executed according to the prediction result, and scheduling the job to be executed based on the available execution time window set; each available execution time window in the set of available execution time windows can be used for enabling the resource state of the system resource to meet the state condition in the execution process of the job to be executed.
The state condition may refer to a condition that can be used to characterize the resource performance of the cloud computing platform or the container platform without exception, for example, the occupancy rate of the CPU is lower than a set occupancy rate threshold, the IO throughput is lower than a set throughput, and the like.
Referring to fig. 4, in this step, a process of generating an available execution time window set of the job to be executed according to the prediction result may be implemented as follows:
step 401, determining whether the time point corresponding to each time slice is an available execution time point according to the first prediction result and the second prediction result.
The available execution time point is a time point which can be used for enabling the resource state of the system resource in the execution environment such as the cloud computing cloud platform or the container platform to meet the state condition when the operation to be executed is executed.
In this step, the first prediction result and the second prediction result may be specifically subjected to aggregation processing, so as to obtain a resource index value and/or a result set of resource state prediction results at a time point corresponding to each time slice; and determining whether the time point corresponding to each time slice is an available execution time point based on a predetermined strategy according to the corresponding result set.
In this embodiment, preferably, the predetermined policy is a decision policy based on a voting mechanism, and in the policy, if for a certain time point, all prediction results of the medium/short term prediction plans including the time point indicate that the resource state of the system resource satisfies a state condition (i.e., a full vote is passed) when it is assumed that the job to be executed is executed at the time point, the time point is determined to be an available execution time point, otherwise, the time point is determined to be an unavailable execution time point.
Step 402, merging and processing the time points of the same assumed job activity of the job to be processed into a job time window.
Step 403, determining whether each time point in each operation time window is an available execution time point; if yes, go to step 404, otherwise, go to step 405.
Step 404, determine the job time window as an available execution time window.
Step 405, determining the job time window as a non-available execution time window, or determining whether the job time window is an available execution time window based on the determination process of availability of the job time window.
If all the time points in a certain operation time window are not available execution time points, the operation time window can be directly determined as a non-available execution time window based on a strategy. Alternatively, optionally, the engineer may be provided with confirmation, and the decision of the engineer to keep or discard is made, that is, the engineer decides whether the operation time window is the available execution time window.
Finally, each determined available execution time window constitutes the set of available execution time windows described above.
It can be known from the above solutions that, in the method of this embodiment, after obtaining the job plan information of the submitted job to be executed, the prediction plan of the job to be executed is determined at least according to the submitted job plan information, the prediction processing is performed on the job to be executed according to the prediction plan, the resource index value and/or the resource state of the system are predicted under the assumption of different assumed job activities of the job to be executed respectively executed in different time periods in the future, a prediction result is obtained, and according to the prediction result, an available execution time window set of the job to be executed is generated, each available execution time window in the set can be used to enable the resource state of the system resource to meet the state condition during the job activity of the job to be executed, so that a suitable job time window can be planned for the job to be executed through the available execution time window set finally, and reducing resource competition for other services in the execution process of the job to be executed.
In an embodiment, the method for predicting job scheduling disclosed in the present application, when determining whether a certain job time window is an available execution time window, further includes the following processing:
for a job time window containing all the time points as available execution time points, determining whether the job time window overlaps with an input time interval which cannot be used for executing the job; if the overlap exists, determining the operation time window as a non-available execution time window; if there is no overlap, the job time window is determined to be an available execution time window.
Specifically, referring to fig. 2, in a cloud computing platform or a container platform environment, it may be required that a server node is offline in a specific time period based on system operation and maintenance requirements, or an important job needs to be executed in a certain time period on the server node, and other jobs are required not to be executed, and based on this situation, an engineer may mark a time interval that cannot be used for executing the job to be executed, or a platform system submits a time interval that cannot be used for executing the job to be executed based on preset operation and maintenance information or job attributes.
In this case, for each of the job time windows including the available execution time point, it is further determined whether the job time window is the available execution time window in combination with the submitted time interval that cannot be used for executing the job to be executed, specifically, if there is an overlap between the job time window and the submitted time interval that cannot be used for executing the job to be executed, it is determined that the job time window is the unavailable execution time window, and if there is no overlap, it is determined that the job time window is the available execution time window.
The embodiment further combines the system operation and the important operation plan to judge the available execution time window, so that the finally obtained available execution time window set has higher accuracy.
In an embodiment, referring to the flowchart of the prediction processing method for job scheduling provided in fig. 5, the prediction processing method for job scheduling disclosed in the present application, after generating the set of available execution time windows of the jobs to be executed, may further include:
and 105, if the available execution time window set is empty, performing dimensionality reduction on the operation parameters of the operation to be executed, and determining the matched available execution time window set according to the operation to be executed after the operation parameters are subjected to dimensionality reduction.
The parameter combination of the to-be-executed operation obtained after the dimension reduction processing can be used for enabling the operation result of the to-be-executed operation to meet the quality condition.
If the set of available execution time windows of the job to be executed is empty, it indicates that there is no suitable time window in the future to execute the job to be executed through prediction, and for this situation, the present embodiment performs dimension reduction processing on the job parameters of the job to be executed, and through this processing, it is supported that the job to be executed is matched to the suitable executable time window.
And performing dimension reduction processing on the job parameters of the job to be executed, including but not limited to reducing/cutting the job parameters related to the job to be executed and/or reducing the values of the job parameters related to the job to be executed. Taking model training operation based on deep learning as an example, performing dimension reduction processing on operation parameters of the operation may refer to reducing concurrency of the operation, simplifying a network structure of a deep learning network, cutting model features to reduce the number of the model features, and/or reducing the number of iterations of a training process, and the like.
Specifically, referring to fig. 6, the main limited resource (critical competitive resource) or index may be identified according to the prediction result corresponding to the job to be executed, for example, if the performance of the platform such as the cloud computing platform or the container platform is degraded due to overload of the CPU, or too high IO throughput, insufficient memory, insufficient network resources, or the like, and a suitable executable time window is not obtained, the main limited resource or index is identified as the CPU, or IO throughput, or memory, or network resources, or the like. And then, configuring different parameter combinations of each operation parameter of the operation to be executed or different value combinations of each parameter within a parameter threshold interval allowed by the main limited resource or index to obtain a Cartesian product of each operation parameter after the parameter is adjusted, and simultaneously evaluating whether an operation execution result of the operation to be executed based on each operation parameter after the parameter is adjusted can meet quality conditions, for example, whether a model training result can still meet the set model precision requirement after model feature cutting or not based on the Cartesian product of each operation parameter after the parameter is adjusted, so as to obtain a dimension reduction processing result which meets the parameter threshold interval allowed by the main limited resource and meets the quality conditions required by the operation result of the operation to be executed.
And then, bringing each parameter of the job to be executed after the dimension reduction processing into an automatically generated updated job prediction plan to execute prediction so as to obtain a proper executable time window.
According to the method and the device, aiming at the condition that the to-be-executed job does not have a proper job time window in the future, the job parameters of the to-be-executed job are subjected to corresponding dimension reduction processing, and the purpose that the proper job time window is planned for the to-be-executed job under the condition that the job quality is ensured is achieved.
Corresponding to the above prediction processing method for job scheduling, an embodiment of the present application further discloses a prediction processing apparatus for job scheduling, where a structure of the apparatus is shown in fig. 7, and the prediction processing apparatus specifically includes:
an obtaining module 701, configured to obtain job plan information of a submitted job to be executed;
a determining module 702, configured to determine a prediction plan according to at least the job plan information;
a prediction processing module 703, configured to perform prediction processing on the job to be executed according to the prediction plan, so as to obtain a prediction result; the prediction result comprises: resource index values and/or resource states of the system under different assumed job activities assumed to be performed on the jobs to be performed for different time periods in the future, respectively;
a generating module 704, configured to generate an available execution time window set of the job to be executed according to the prediction result, so as to schedule the job to be executed based on the available execution time window set; each available execution time window in the set of available execution time windows can be used for enabling the resource state of the system resource to meet a state condition in the execution process of the job to be executed.
In an embodiment, the determining module 702 is specifically configured to:
acquiring operation data of historical operation corresponding to the operation to be executed and monitoring data obtained by monitoring system resource indexes and/or resource states in the historical operation executing process;
and determining a prediction plan according to the operation plan information of the operation to be executed and the operation data and the monitoring data of the historical operation.
In one embodiment, the job planning information of the job to be executed at least includes execution time planning information of the job to be executed and node deployment planning information, and the node deployment planning information is used for indicating related planning information when the job to be executed is deployed to a corresponding node;
the determining module 702 is specifically configured to, when determining the prediction plan according to the job plan information of the job to be executed and the job data and the monitoring data of the historical job, determine:
determining each deployment position of the operation to be executed and a first resource dependency relationship between each deployment position and the dependent resource according to the node deployment plan information;
at least determining a second resource dependency relationship between the job to be executed and the dependent resource deployed at each position according to the first resource dependency relationship between each deployment position and the dependent resource;
and determining a prediction plan according to the first resource dependency relationship, the second resource dependency relationship, the operation data and the monitoring data of the historical operation.
In one embodiment, the prediction plan of the job to be executed includes: a medium-term prediction plan and a short-term prediction plan when prediction processing is performed on a job to be executed;
wherein each of the medium-term prediction plan and the short-term prediction plan includes: the corresponding prediction time range is divided into a plurality of time slices according to a preset step length, and a prediction algorithm corresponding to each time slice and parameters required by the prediction algorithm are obtained; the prediction time range corresponding to the medium-term prediction plan contains and is larger than the prediction time range corresponding to the corresponding short-term prediction plan;
the prediction processing module 703 is specifically configured to:
according to a prediction algorithm and parameters in the short-term prediction plan, predicting resource index values and/or resource states of a system when the operation to be executed is executed in each time slice corresponding to the short-term prediction plan to obtain a first prediction result;
and predicting the resource index value and/or the resource state of the system when the operation to be executed is executed in each time slice corresponding to the medium-term prediction plan according to the prediction algorithm and the parameters in the medium-term prediction plan, and obtaining a second prediction result.
In an embodiment, the generating module 704 is specifically configured to:
determining whether the time point corresponding to each time slice is an available execution time point or not according to the first prediction result and the second prediction result; the available execution time point is a time point which can be used for enabling the resource state of the system resource to meet the state condition when the operation to be executed is executed;
merging and processing the time points of the same assumed job activity into a job time window;
determining whether all time points in each operation time window are available execution time points;
if so, determining the operation time window as an available execution time window; if not, determining the operation time window as a non-available execution time window, or determining whether the operation time window is an available execution time window based on the confirmation processing of availability of the operation time window;
and the determined available execution time windows form an available execution time window set of the to-be-executed operation.
In an embodiment, when determining whether the time point corresponding to each time slice is the available execution time point according to the first prediction result and the second prediction result, the generating module 704 is specifically configured to:
converging the first prediction result and the second prediction result to obtain a resource index value and/or a result set of resource state prediction results of a time point corresponding to each time slice;
and determining whether the time point corresponding to each time slice is an available execution time point or not based on a preset strategy according to the corresponding result set.
In one embodiment, the apparatus further comprises:
the dimension reduction processing module is used for performing dimension reduction processing on the operation parameters of the operation to be executed under the condition that the available execution time window set determined for the operation to be executed is empty, so as to determine a matched available execution time window set according to the operation to be executed after the dimension reduction processing of the operation parameters;
the parameter combination of the to-be-executed operation obtained after the dimension reduction processing can be used for enabling the operation result of the to-be-executed operation to meet the quality condition.
The prediction processing device for job scheduling disclosed in the embodiment of the present application is relatively simple in description because it corresponds to the prediction processing method for job scheduling disclosed in the above respective method embodiments, and for the relevant similarities, please refer to the description of the above respective method embodiments, and details are not described here.
The embodiment of the present application further discloses an electronic device, where the electronic device may be, but is not limited to, a server device in a cloud computing environment, and a composition structure of the electronic device is as shown in fig. 8, and specifically includes:
a memory 801 for storing a set of computer instructions;
the set of computer instructions may be embodied in the form of a computer program.
A processor 802 for implementing the control method as disclosed in any of the above method embodiments by executing a set of computer instructions.
The processor 802 may be a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device.
Besides, the electronic device may further include a communication interface, a communication bus, and the like. The memory, the processor and the communication interface communicate with each other via a communication bus.
The communication interface is used for communication between the electronic device and other devices. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like, and may be divided into an address bus, a data bus, a control bus, and the like.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
For convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A method of predictive processing of job scheduling, the method comprising:
acquiring submitted operation plan information of the operation to be executed;
determining a prediction plan at least according to the operation plan information;
according to the prediction plan, performing prediction processing on the operation to be executed to obtain a prediction result; the prediction result comprises: resource index values and/or resource states of the system under different assumed job activities assumed to be performed on the jobs to be performed for different time periods in the future, respectively;
generating an available execution time window set of the job to be executed according to the prediction result so as to schedule the job to be executed based on the available execution time window set; each available execution time window in the set of available execution time windows can be used for enabling the resource state of the system resource to meet a state condition in the execution process of the job to be executed.
2. The method of claim 1, said determining a prediction plan based at least on said job plan information, comprising:
acquiring operation data of historical operation corresponding to the operation to be executed and monitoring data obtained by monitoring system resource indexes and/or resource states in the historical operation executing process;
and determining a prediction plan according to the operation plan information and the operation data and the monitoring data of the historical operation.
3. The method according to claim 2, wherein the job planning information at least includes execution time planning information and node deployment planning information of the job to be executed, and the node deployment planning information is used for indicating relevant planning information when the job to be executed is deployed to a corresponding node;
determining a prediction plan according to the operation plan information and the operation data and monitoring data of the historical operation comprises the following steps:
determining each deployment position of the operation to be executed and a first resource dependency relationship between each deployment position and a dependent resource according to the node deployment plan information;
at least determining a second resource dependency relationship between the job to be executed and the resource depended on, which are deployed at each position, according to the first resource dependency relationship between each deployment position and the resource depended on;
and determining a prediction plan according to the first resource dependency relationship, the second resource dependency relationship, and the operation data and the monitoring data of the historical operation.
4. The method of claim 3, the predictive plan comprising: a medium-term prediction plan and a short-term prediction plan when performing prediction processing on the job to be executed;
wherein each of the medium term prediction plan and the short term prediction plan comprises: the corresponding prediction time range is divided into a plurality of time slices according to a preset step length, and a prediction algorithm corresponding to each time slice and parameters required by the prediction algorithm are obtained; the prediction time range corresponding to the medium-term prediction plan contains and is larger than the prediction time range corresponding to the corresponding short-term prediction plan;
the executing the prediction processing on the job to be executed according to the prediction plan to obtain a prediction result includes:
according to a prediction algorithm and parameters in the short-term prediction plan, predicting a resource index value and/or a resource state of a system when the operation to be executed is executed in each time slice corresponding to the short-term prediction plan to obtain a first prediction result;
and predicting the resource index value and/or the resource state of the system when the operation to be executed is executed in each time slice corresponding to the medium-term prediction plan according to the prediction algorithm and the parameters in the medium-term prediction plan to obtain a second prediction result.
5. The method of claim 4, the generating the set of available execution time windows for the job to be executed according to the prediction result, comprising:
determining whether the time point corresponding to each time slice is an available execution time point or not according to the first prediction result and the second prediction result; the available execution time point is a time point which can be used for enabling the resource state of the system resource to meet the state condition when the operation to be executed is executed;
merging and processing the time points of the same assumed job activity into a job time window;
determining whether all time points in each operation time window are available execution time points;
if so, determining the operation time window as an available execution time window; if not, determining the operation time window as a non-available execution time window, or determining whether the operation time window is an available execution time window based on the confirmation processing of availability of the operation time window;
wherein each determined available execution time window constitutes the set of available execution time windows.
6. The method of claim 5, wherein determining whether a time point corresponding to each time slice is an available execution time point according to the first prediction result and the second prediction result comprises:
converging the first prediction result and the second prediction result to obtain a resource index value and/or a result set of resource state prediction results of a time point corresponding to each time slice;
and determining whether the time point corresponding to each time slice is an available execution time point or not based on a preset strategy according to the corresponding result set.
7. The method of claim 5, further comprising:
for a job time window containing all the time points as available execution time points, determining whether the job time window overlaps with an input time interval which cannot be used for executing the job;
if the operation time window is overlapped, determining the operation time window as a non-available execution time window;
and if the overlap does not exist, determining the operation time window as an available execution time window.
8. The method of claim 1, after generating the set of available execution time windows for the job to be executed, further comprising:
if the available execution time window set is empty, performing dimensionality reduction on the operation parameters of the operation to be executed, and determining a matched available execution time window set according to the operation to be executed after the operation parameters are subjected to dimensionality reduction;
the parameter combination of the to-be-executed operation obtained after the dimension reduction processing can be used for enabling the operation result of the to-be-executed operation to meet a quality condition.
9. A prediction processing apparatus of job scheduling, the apparatus comprising:
the acquisition module is used for acquiring the submitted operation plan information of the operation to be executed;
a determination module for determining a prediction plan based at least on the job plan information;
the prediction processing module is used for executing prediction processing on the operation to be executed according to the prediction plan to obtain a prediction result; the prediction result comprises: resource index values and/or resource states of the system under different assumed job activities assumed to be performed on the jobs to be performed for different time periods in the future, respectively;
a generating module, configured to generate an available execution time window set of the to-be-executed job according to the prediction result, so as to schedule the to-be-executed job based on the available execution time window set; each available execution time window in the set of available execution time windows can be used for enabling the resource state of the system resource to meet a state condition in the execution process of the job to be executed.
10. An electronic device, comprising:
a memory for storing a set of computer instructions;
a processor for implementing the method of predictive processing of job scheduling of any of claims 1 to 8 by executing a set of instructions stored on a memory.
CN202111123438.5A 2021-09-24 2021-09-24 Job scheduling prediction processing method and device and electronic equipment Pending CN113850428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111123438.5A CN113850428A (en) 2021-09-24 2021-09-24 Job scheduling prediction processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111123438.5A CN113850428A (en) 2021-09-24 2021-09-24 Job scheduling prediction processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113850428A true CN113850428A (en) 2021-12-28

Family

ID=78979367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111123438.5A Pending CN113850428A (en) 2021-09-24 2021-09-24 Job scheduling prediction processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113850428A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185825A1 (en) * 2022-03-30 2023-10-05 阿里巴巴(中国)有限公司 Scheduling method, first computing node, second computing node, and scheduling system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185825A1 (en) * 2022-03-30 2023-10-05 阿里巴巴(中国)有限公司 Scheduling method, first computing node, second computing node, and scheduling system

Similar Documents

Publication Publication Date Title
US11132288B2 (en) Data-driven scheduling of automated software program test suites
US11119878B2 (en) System to manage economics and operational dynamics of IT systems and infrastructure in a multi-vendor service environment
US11531909B2 (en) Computer system and method for machine learning or inference
US11023325B2 (en) Resolving and preventing computer system failures caused by changes to the installed software
US7797141B2 (en) Predictive analysis of availability of systems and/or system components
US7864679B2 (en) System utilization rate managing apparatus and system utilization rate managing method to be employed for it, and its program
US10909503B1 (en) Snapshots to train prediction models and improve workflow execution
CN109586954B (en) Network traffic prediction method and device and electronic equipment
CN111381970B (en) Cluster task resource allocation method and device, computer device and storage medium
EP3932025B1 (en) Computing resource scheduling method, scheduler, internet of things system, and computer readable medium
CN110782706B (en) Early warning method and device for driving risk of intelligent vehicle
CN115269108A (en) Data processing method, device and equipment
CN109992408B (en) Resource allocation method, device, electronic equipment and storage medium
CN113850428A (en) Job scheduling prediction processing method and device and electronic equipment
CN113485833B (en) Resource prediction method and device
WO2020206699A1 (en) Predicting virtual machine allocation failures on server node clusters
US11119879B2 (en) Detection of resource bottlenecks in execution of workflow tasks using provenance data
CN110389817B (en) Scheduling method, device and computer readable medium of multi-cloud system
CN112685390B (en) Database instance management method and device and computing equipment
CN110008098B (en) Method and device for evaluating operation condition of nodes in business process
CN113485933A (en) Automatic testing method and distributed system
US20160224378A1 (en) Method to control deployment of a program across a cluster of machines
US11855849B1 (en) Artificial intelligence based self-organizing event-action management system for large-scale networks
CN115022173B (en) Service capacity expansion method, device, equipment and storage medium
CN113051749B (en) Aircraft reliability data asset metadata decomposition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination