CN114546623A - Task scheduling method and system based on big data system - Google Patents

Task scheduling method and system based on big data system Download PDF

Info

Publication number
CN114546623A
CN114546623A CN202210192697.1A CN202210192697A CN114546623A CN 114546623 A CN114546623 A CN 114546623A CN 202210192697 A CN202210192697 A CN 202210192697A CN 114546623 A CN114546623 A CN 114546623A
Authority
CN
China
Prior art keywords
task
scheduling
tasks
big data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210192697.1A
Other languages
Chinese (zh)
Other versions
CN114546623B (en
Inventor
周希武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Second Peoples Hospital of Huaian
Original Assignee
Second Peoples Hospital of Huaian
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Second Peoples Hospital of Huaian filed Critical Second Peoples Hospital of Huaian
Priority to CN202210192697.1A priority Critical patent/CN114546623B/en
Publication of CN114546623A publication Critical patent/CN114546623A/en
Application granted granted Critical
Publication of CN114546623B publication Critical patent/CN114546623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a task scheduling method and a task scheduling system based on a big data system, wherein the method comprises the steps of establishing a task based on a task establishing request; calculating a task characteristic matrix, and putting the tasks into a task pool; acquiring an available computing node set; selecting a target task set from a task pool based on a set of available computing nodes; assigning tasks in a target set of tasks to the set of available compute nodes. The invention carries out task scheduling based on the matching of the set and the set, improves the flexibility of task scheduling, enables the task scheduling to achieve local optimization under the condition of real-time allowance, enables complex characteristic description to be possible under the condition of limited calculated amount, and greatly improves the task scheduling efficiency under the environment of a big data system.

Description

Task scheduling method and system based on big data system
[ technical field ] A method for producing a semiconductor device
The invention belongs to the field of big data, and particularly relates to a task scheduling method and system based on a big data system.
[ background of the invention ]
With the technological progress, information development has entered the big data era, and data medical care is an important field of big data, and with the rapid development of medical information technology, more and more medical institutions in China are accelerating to implement the overall construction based on the information platform, so as to improve the service level and the core competitiveness of hospitals. Big data applications outbreak as new medical changes are started. When rapidly growing diversified hospital data meets big data technology, medical big data application is rapidly welcomed by medical institutions. Basically, a large hospital has hundreds of systems operating on line, so that when receiving impact of big data in an information center of the hospital at present, the first problems to be faced are how to integrate information and how to combine characteristics of the big data of the hospital with a big data platform or a big data system, thereby efficiently utilizing advantages and efficiency of the big data; in fact, for medical information, there is a strong correlation between information, because there is a certain correlation between patients and patients, between different periods of patients themselves, and between doctors, but the correlation of information may not be fully understood and discovered in a semantic understanding manner, how to go from higher levels, for example: the method is characterized in that an operating system layer, a task scheduling layer and a possible future hardware platform building layer are used for utilizing the inherent relevance of the big data information, so that the overall operation efficiency of the big data is improved, and the method is a new research direction.
The traditional task scheduling system generally only supports the whole task to be executed according to the sequence of the directed acyclic graph, and whether a task node is triggered to be executed or not is conditioned to that all the upstream nodes of the task node are triggered to be executed, and the execution results are successful. The execution of the tasks is also performed in sequence, the data information contained in the big data and the current big data analysis capability are not fully utilized, valuable new information is obtained by mining, namely, the task execution logic in the related technology is not flexible enough. And the big data system can collect and analyze the preset information, and the preset information comprises the data of the basic application system. The big data system is mainly responsible for collecting and analyzing the data of the whole service platform, and mainly realizes acquisition and comprehensive analysis of various data of the basic application system. Therefore, how to optimize the conventional task scheduling system and how to utilize the characteristics of large data nodes and large task number and high flexibility to improve the usability of the system for developers and the task execution efficiency of the system for developers is urgent.
The invention carries out task scheduling based on the matching of the set and the set, improves the flexibility of task scheduling, and leads the task scheduling to achieve local optimization under the condition of real-time permission; a task characteristic matrix, a worst task matrix and an optimal task matrix are provided, and based on the representation, the matching between the computing nodes and the tasks is converted into the matching between the tasks and the tasks, so that the complex characteristic description becomes possible under the condition of limited computing amount, and the task scheduling efficiency is greatly improved; (2) the available states of the computing nodes are transmitted based on the state variables, and the state variables are modified in an adding and subtracting mode, so that the scheduling nodes can accurately and timely construct an available computing node set when the opportunity arrives; and further, differentiated information transfer is provided for main bodies of all scheduling parties through differentiated placement of state nodes.
[ summary of the invention ]
In order to solve the above problems in the prior art, the present invention provides a task scheduling method and system based on a big data system, wherein the system comprises:
a task scheduling method based on a big data system, the method comprising:
step S1: creating a task based on the task creation request;
step S2: calculating a task characteristic matrix, and putting the tasks into a task pool;
step S3: acquiring an available computing node set;
step S4: selecting a target task set from a task pool based on the available compute node sets;
step S5: assigning tasks in a target set of tasks to the set of available compute nodes.
Further, the task features and the tasks are associated and placed into a task pool.
Further, the tasks include a data reading task, a database reading task, a data calculating task, a data analyzing task, a database data updating task and the like.
Further, the task creation request comes from one or more terminal devices.
Further, the task is a big data platform task.
A big data system based task scheduling system, comprising: the system comprises terminal equipment, a computing node and a scheduling node;
the terminal equipment is used for initiating a task creating request to the scheduling node;
the computing node is used for receiving task scheduling and executing tasks;
the scheduling node includes:
a creation module for creating a task based on the task creation request;
the task characteristic matrix calculation module is used for calculating a task characteristic matrix and putting the tasks into a task pool;
the available computing node determining module is used for acquiring an available computing node set;
the target task set selection module is used for selecting a target task set from the task pool based on the available computing node set;
an assignment module to assign tasks in a target set of tasks to the set of available computing nodes.
A processor is used for running a program, wherein the program executes the task scheduling method based on the big data system during running.
An execution device comprising a processor coupled to a memory, the memory storing program instructions that, when executed by the processor, implement the big data system based task scheduling method.
A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the big data system based task scheduling method.
A cloud server is configured to execute the big data system-based task scheduling method.
The beneficial effects of the invention include:
(1) the task scheduling is carried out based on the matching of the set and the set, so that the flexibility of the task scheduling is improved, and the task scheduling can achieve local optimization under the condition of real-time allowance; meanwhile, matching between the computing nodes and the tasks is converted into matching between the tasks through the worst task matrix and the optimal task matrix, so that complex feature description becomes possible under the condition of limited computation amount, and the task scheduling efficiency is greatly improved; (2) the available states of the computing nodes are transmitted based on the state variables, and the state variables are modified in an adding and subtracting mode, so that the scheduling nodes can accurately and timely construct an available computing node set when the opportunity arrives; and further, differentiated information transfer is provided for main bodies of all scheduling parties through differentiated placement of state nodes.
[ description of the drawings ]
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, and are not to be considered limiting of the invention, in which:
fig. 1 is a schematic diagram of a task scheduling method based on a big data system according to the present invention.
[ detailed description ] embodiments
The invention will be described in detail with reference to the drawings and specific embodiments, wherein the exemplary embodiments and the description are only for the purpose of illustrating the invention, and are not to be construed as limiting the invention
The big data system can collect and analyze data, mainly aims at acquiring and comprehensively analyzing various data of a basic application system, and needs to create tasks and reasonably schedule the acquisition, the analysis and the comprehensive of the data. For diversified tasks and computing nodes with states changing at any time in a big data system, how to schedule the tasks is crucial; the computing nodes are positioned in the same or different hardware equipment and are connected with each other by adopting a bus, an internal circuit or a network connection; the scheduling node is used for creating a task based on a task creating request submitted by the terminal equipment and scheduling the task to a proper computing node for execution;
the invention provides a task scheduling method based on a big data system, which comprises the following steps:
step S1: creating a task based on the task creation request;
preferably, the following components: the tasks comprise a data reading task, a database reading task, a data calculation task, a data analysis task, a database data updating task and the like;
preferably, the following components: the task is a big data platform task;
preferably: the task creation request comes from one or more terminal devices;
preferably: the scheduling node creates a task based on the task creation request; for example: the scheduling node is a task scheduling server;
step S2: calculating task characteristics, associating the task characteristics with the tasks and putting the task characteristics and the tasks into a task pool;
the computing task characteristics; the method specifically comprises the following steps: acquiring various types of task features, wherein each type of task feature comprises one or more sub-features, and after the sub-features are normalized, each type of feature is expressed by a vector; finally, combining a plurality of vectors corresponding to a plurality of types to form a matrix, and expressing the task characteristics by using a task characteristic matrix; when the number of the sub-features is different, parts with different corresponding vector lengths are filled with 0;
preferably: the task features comprise contour features, data features and execution features; respectively converting the task contour features, the data features and the execution features into task contour feature vectors, data feature vectors and execution feature vectors; for each type of feature, normalizing the numerical sub-feature to be used as an element in a vector, and converting the non-numerical sub-feature into the numerical feature to be used as an element in the vector; for example: summary features include, for example: task type, owner, real-time, number of temporary variables, and/or task execution time, etc.; the task type TypeA is converted into a numerical value 1, the type B is converted into a numerical value 2, and the rest is done in the same way; when the task type is 10 types, the vector element corresponding to the task type sub-feature after normalization is 0.1; all elements in the vector belong to [0,1] through normalization;
preferably: the task execution block is used for acquiring the outline characteristics of the task;
data characteristics are used to represent characteristics of data involved in the execution of a task, such as: the number of static variables, the number of temporary variables, the length of a data file, the size of a data space of a task and/or the like;
the execution characteristics are used for representing the execution characteristics of the task in the execution process; for example: the characteristics of the task execution graph, the number of loops, the variable length of loop variables, the length of task execution paths, the number of functions and/or the like;
the task pool is used for managing tasks to be scheduled, the task pool can be managed in a queue mode, and the tasks in the task pool are sorted according to priority; the priority here is a scheduling priority, which can be calculated by various existing calculation methods;
preferably: setting the latest scheduling time for each task, wherein the latest scheduling time of the real-time task is related to the real-time characteristic of the real-time task, and when the task is not the real-time task and the latest scheduling time is not set, setting default length timer equipment for calculating the latest scheduling time; thus, each task can not be scheduled never because of local optimization based on the set;
step S3: acquiring an available computing node set; specifically, the method comprises the following steps: passing available information through the state variable values; when a computing node is available, the computing node modifies a corresponding state variable, a scheduling server reads the state variable so as to obtain the available condition of the computing node, and a current available computing node set is constructed based on the available condition;
when the computing node and the scheduling node communicate through internal lines such as a bus, exact and accurate information can be obtained; in a big data system, however, the communication between the computing node and the scheduling node is not accurate and definite, the invention transmits the available information based on the state variable value, so that even if the information of the computing node is not read correctly under a few conditions, the computing node only wastes a scheduling time, and because the state variable can be read and written independently between the computing node and the scheduling node, great flexibility is brought; on the basis, the communication overhead under the condition is reduced in a state variable mode, complicated handshake operation is cancelled, and the available state of the computing node is transmitted based on the state variable, so that the scheduling node can accurately and timely construct an available computing node set when the opportunity arrives;
preferably: setting a state variable as an integer variable; modifying the state variables in an adding and subtracting way to express different available information; different computing nodes modify the state variables by adding or subtracting fixed values to express the change condition of the available information of the different computing nodes; the fixed numerical values corresponding to different computing nodes are different; for example: for the 32-bit integer variable, the change conditions of 32 computing nodes can be identified, the fixed numerical value corresponding to the first computing node is 1, the fixed numerical value corresponding to the second computing node is 10, and so on, the state of the first computing node is modified in a mode of adding or subtracting 1; if the current state variable is 00000111 and 1 is used to identify available, then the state variable identifies that the first, second, and third compute nodes are available and the other compute nodes are not available; by the mode, the state transmission overhead is minimized, and the increase of the number of the state variables is very limited along with the increase of the number of the computing nodes needing scheduling management;
correspondingly, differential support can be provided for different computing nodes and task scheduling through differential placement of state nodes;
preferably: the state variables are one or more; when the number of the state variables is multiple, the state variables are placed in different storage positions of the same node or different nodes;
the state variables are placed in different nodes, specifically: a state variable corresponds to an available state representation of one or more compute nodes; the first computing node set corresponds to a first state variable; the second computing node set corresponds to a second state variable; one or more computing nodes are arranged in the first computing node set and the second computing node set, and the computing nodes are targets of task scheduling; the first state variable is placed in the first node, and the second state variable is placed in the second node; the first node and the second node are different physical nodes; by analogy, more state variables are correspondingly set aiming at more differentiation requirements, namely more calculation node sets, and the state variables are set at the same or different nodes; the nodes are big data system nodes and comprise various computer entities such as computing nodes, scheduling nodes, security nodes, computing cores, processors and the like;
one more specific way is: respectively placing the state variables in the scheduling nodes or different nodes; the node can be a node providing a higher security level or different security levels, or for an important computing node, the corresponding state variable is directly placed on the important computing node, so that after the scheduling node obtains the access right of a specific area, information is actively read from the important computing node, and other work and work types of the important computing node are reduced; wherein: the specific area comprises an area where the state variable is located;
the placing of the state variables at different storage positions of the same node specifically includes: a state variable corresponds to a state representation of one or more compute nodes; placing a first state variable corresponding to the first computing node set at a first position of the first node, and placing a second state variable corresponding to the second computing node set at a second position of the first node; the first location and the second location are different storage locations of the first node; the different storage locations are located on different storage units; the different memory cells include: caches, registers, memories, different banks of memories, encrypted storage units, unencrypted storage units and/or extended storage units, etc.;
one more specific way is: setting both the first state variable and the second state variable on a scheduling node, the computing nodes to which scheduling is directed being different computing units on the same computing device,
preferably: the different computing units are different processors and different computing cores on the same computing device; the different computing units are connected on the basis of a bus; the state variables corresponding to different computing cores are placed in a cache, and the state variables corresponding to different processors are placed in a memory;
step S4: selecting a target task set from a task pool based on a set of available computing nodes; the method specifically comprises the following steps: calculating node set characteristics of an available computing node set; selecting a target task from the task set to form a target task set, and enabling the task set characteristics of the target task set to be matched with the node set characteristics to the highest degree;
the step S4 specifically includes the following steps:
step S41: for each computing node in the available computing node set, acquiring an optimal task matrix and a worst task matrix corresponding to the computing node;
one specific acquisition mode is as follows: performing various task executions on the computing nodes, recording the execution efficiency of the tasks, determining the tasks with the highest execution efficiency and the tasks with the lowest execution efficiency, and extracting the feature matrixes corresponding to the tasks with the highest execution efficiency and the tasks with the lowest execution efficiency to obtain an optimal task matrix and a worst task matrix; the calculation node identification and the corresponding optimal task matrix and worst task matrix are stored in an associated mode;
the execution efficiency can be one or more combination of multiple indexes such as execution time, resource interest rate and the like;
the optimal task matrix of the computing node is a task characteristic matrix corresponding to a task with the highest execution efficiency for the computing node; the worst task matrix of the computing node is a task characteristic matrix corresponding to a task with the lowest execution efficiency for the computing node; then, because the configuration of each computing node is the same or different, the corresponding optimal task matrix and worst task matrix for each computing node are the same or different;
in the prior art, the characteristics of a computing node are described, the characteristics of storage, calculation and other aspects of the computing node are described, and the matching degree is determined by matching between the computing node characteristics and task characteristics, but actually, with the increase of the number of characteristics which can be used for description, the description of the matching degree between the node characteristics and the task characteristics is difficult, and the measurement of the matching degree is carried out by means of artificial intelligence means such as a neural network, but with the increase of the description characteristics, the calculated amount is explosively increased;
step S42: selecting a target task from the task set to form a target task set; the method specifically comprises the following steps: when the step is entered for the first time, selecting a target number of tasks from the task pool as target tasks according to the sequence of the priorities from high to low, wherein: the target number is equal to the number of nodes in the set of available computing nodes; when the step is entered again, a first number of target tasks are selected from the task pool again and put into the target task set, and the first number of target tasks are replaced from the target task set and put back into the task pool;
preferably: selecting a first number of target tasks from the task pool according to the sequence of the priority from high to low to put the target tasks into the target task set, and replacing the first number of target tasks from the target task set to put the target tasks into the task pool; that is, a first number of tasks are replaced at a time;
preferably: the first number is equal to 1;
the first number of target tasks are swapped out from the target task set; the method specifically comprises the following steps: selecting a first number of target tasks with lowest priority from the target task set to be swapped out;
preferably: when the step is entered again, firstly determining whether a target task which can be replaced exists, if so, continuing to execute, otherwise, not replacing the target task set, and entering the step S45; when the latest scheduling time of each task in the target task set is reached, no task which can be replaced exists any more; similarly, when nodes are switched in, tasks which have arrived at the latest scheduling time are necessarily selected or switched in a target task set in a limited way due to higher priority;
preferably: if the number of target tasks that can be swapped out is less than the first number, resetting the first number to be equal to the number of target tasks that can be swapped out, and resetting the first number value when entering this step the next time;
the method comprises the following steps of reselecting a first number of target tasks from a task pool to be placed into a target task set, and replacing the first number of target tasks from the target task set to be placed back into the task pool, and specifically comprises the following steps: selecting a first number of tasks as target tasks to be put into a target task set according to the priority level in the tasks which are not selected in the task pool for subsequent matching degree calculation; at the same time, a first number of target tasks need to be swapped out, i.e. deleted from the set; one mode of swapping out selection is to select a first number of target tasks with the lowest priority from a target task set to swap out, and the swapped out target tasks are put into a scheduling pool and are no longer the target tasks; when the next scheduling opportunity comes, the target task can be selected again;
step S43: calculating the characteristic matching degree of the target task set and the available computing node set; when the feature matching degree is higher than the preset matching degree, the process proceeds to step S45; otherwise, go to step S42;
the feature matching degree of the calculation target task set and the available calculation node set is obtained; the method specifically comprises the following steps:
step S43a 1: constructing a current scheduling mode SCH _ CNT, and entering the step S43A 2;
the constructing of the current scheduling mode SCH _ CNT specifically includes: for each task in the target task set, sequentially calculating the distance between the task feature matrix of the task and the optimal feature matrix of each node according to the sequence of the priority of the task from large to small, and scheduling the task to the calculation node corresponding to the optimal feature matrix with the minimum distance; taking the scheduling result as a current scheduling mode SCH _ CNT;
step S43a 2: calculating a characteristic matching degree MH under a current scheduling mode;
Figure BDA0003525520740000051
wherein: MHiThe characteristic matching degree of the ith calculation node in the current scheduling mode; MH is the characteristic matching degree of the target task set and the available computing node set in the current scheduling mode; n is the target task number; MXGiIs the ith computing nodeAn optimal feature matrix; MXBiIs the worst feature matrix of the ith calculation node; t iskIs a task characteristic matrix of a task k in a current scheduling mode; the expression of | | is to solve the matrix; at this time, the k task is distributed on the i computing node;
step S43a 3: outputting the current scheduling mode and the corresponding characteristic matching degree;
through the process of local optimization inside the set, the scheduling strategy is optimized on the basis of guaranteeing priority-based scheduling;
step S45: taking the target task set as a selected target task set and outputting the target task set; specifically, the method comprises the following steps: recording the selected target task set and the corresponding feature matching degree thereof in the execution process of the steps, and taking the target task set with the highest feature matching degree as the selected target task set and outputting the target task set;
preferably: outputting a target task set and a corresponding scheduling mode in a correlated manner;
the scheduling method is to schedule one task to one computing node, and certainly, when the size of the task is large, allocating an independent task to one computing unit can cause that the computing unit cannot respond and the task cannot be completed in time;
step S5: assigning tasks in the target task set to a set of available compute nodes; the method specifically comprises the following steps: assigning the tasks in the target task set to the available computing node set according to the current scheduling mode;
of course, in the case that the target task set and the available computing nodes are determined at this time, various alternative scheduling manners may be adopted, for example: calculating the degree of fit between each task and each node, and scheduling and assigning according to the degree of fit, which can be scheduled by referring to a single index, but is a backward scheduling idea because the feature matrix has taken various indexes into consideration;
based on the same inventive concept, the invention provides a task scheduling system based on a big data system, which comprises: the system comprises terminal equipment, a computing node and a scheduling node;
the terminal equipment is used for initiating a task creating request to the scheduling node;
the computing node is used for receiving task scheduling and executing tasks;
the scheduling node includes:
a creation module for creating a task based on the task creation request;
the task characteristic matrix calculation module is used for calculating a task characteristic matrix and putting the tasks into a task pool;
the available computing node determining module is used for acquiring an available computing node set;
the target task set selection module is used for selecting a target task set from the task pool based on the available computing node set;
an assignment module to assign tasks in a target set of tasks to the set of available computing nodes.
Preferably: the scheduling node is a cloud server, the cloud server is built on a cloud platform, and services provided by a core service layer of the cloud computing platform supported by the cloud server include infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). The IaaS provides real or virtual computing, storage and network services, the PaaS provides application deployment and management services, and the SaaS provides application programs. The enterprise to which the business terminal belongs can rent the information exchange processing.
The terms "compute node," "dispatch node," "terminal device," or "computing device" encompass all kinds of apparatuses, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or a plurality or combination of the foregoing. The apparatus can comprise special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform execution environment, a virtual machine, or a combination of one or more of the above. The apparatus and execution environment may implement a variety of different computing model infrastructures, such as web services, distributed computing, and grid computing infrastructures.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. The computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subroutines, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A task scheduling method based on a big data system is characterized by comprising the following steps:
step S1: creating a task based on the task creation request;
step S2: calculating a task characteristic matrix, and putting the tasks into a task pool;
step S3: acquiring an available computing node set;
step S4: selecting a target task set from a task pool based on a set of available computing nodes;
step S5: assigning tasks in a target set of tasks to the set of available compute nodes.
2. The big data system-based task scheduling system of claim 1, wherein task features and tasks are associated and placed into a task pool.
3. The big data system-based task scheduling system of claim 2, wherein the tasks are a data reading task, a database reading task, a data calculating task, a data analyzing task, and a database data updating task.
4. The big data system-based task scheduling system of claim 3, wherein the task creation request is from one or more end devices.
5. The big data system-based task scheduling system of claim 4, wherein the task is a big data platform task.
6. A big data system based task scheduling system, comprising: the system comprises terminal equipment, a computing node and a scheduling node;
the terminal equipment is used for initiating a task creating request to the scheduling node;
the computing node is used for receiving task scheduling and executing tasks;
the scheduling node includes:
a creation module for creating a task based on the task creation request;
the task characteristic matrix calculation module is used for calculating a task characteristic matrix and putting the tasks into a task pool;
the available computing node determining module is used for acquiring an available computing node set;
the target task set selection module is used for selecting a target task set from the task pool based on the available computing node set;
an assignment module to assign tasks in a target set of tasks to the set of available computing nodes.
7. A processor, wherein the processor is configured to execute a program, wherein the program executes the big data system based task scheduling method according to any one of claims 1 to 5.
8. An execution device comprising a processor coupled to a memory, the memory storing program instructions that, when executed by the processor, implement the big data system based task scheduling method of any of claims 1-5.
9. A computer-readable storage medium, characterized by comprising a program which, when run on a computer, causes the computer to execute the big data system based task scheduling method according to any of claims 1-5.
10. A cloud server, characterized in that the cloud server is configured to execute the big data system based task scheduling method according to any one of claims 1-5.
CN202210192697.1A 2022-03-01 2022-03-01 Task scheduling method and system based on big data system Active CN114546623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210192697.1A CN114546623B (en) 2022-03-01 2022-03-01 Task scheduling method and system based on big data system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210192697.1A CN114546623B (en) 2022-03-01 2022-03-01 Task scheduling method and system based on big data system

Publications (2)

Publication Number Publication Date
CN114546623A true CN114546623A (en) 2022-05-27
CN114546623B CN114546623B (en) 2022-12-27

Family

ID=81662622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210192697.1A Active CN114546623B (en) 2022-03-01 2022-03-01 Task scheduling method and system based on big data system

Country Status (1)

Country Link
CN (1) CN114546623B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130191843A1 (en) * 2011-08-23 2013-07-25 Infosys Limited System and method for job scheduling optimization
CN106936892A (en) * 2017-01-09 2017-07-07 北京邮电大学 A kind of self-organizing cloud multi-to-multi computation migration method and system
CN107357652A (en) * 2017-06-16 2017-11-17 北京邮电大学 A kind of cloud computing method for scheduling task based on segmentation sequence and standard deviation Dynamic gene
US20180039520A1 (en) * 2015-03-24 2018-02-08 Telefonaktiebolaget Lm Ericsson (Publ) Methods and Nodes for Scheduling Data Processing
US20180067764A1 (en) * 2016-09-08 2018-03-08 International Business Machines Corporation Smart reduce task scheduler
CN110209494A (en) * 2019-04-22 2019-09-06 西北大学 A kind of distributed task dispatching method and Hadoop cluster towards big data
CN110287009A (en) * 2019-05-28 2019-09-27 北京大米科技有限公司 A kind of working node selection method, device, storage medium and server
CN110489229A (en) * 2019-07-17 2019-11-22 长沙学院 A kind of multiple target method for scheduling task and system
CN110661844A (en) * 2019-08-16 2020-01-07 北京旷视科技有限公司 Automatic release scheduling system, method and storage medium
US20200278889A1 (en) * 2019-02-28 2020-09-03 International Business Machines Corporation Task management using a virtual node
CN111651257A (en) * 2020-06-02 2020-09-11 广州道源信息科技有限公司 Task scheduling optimization method of intelligent household system
US20210081787A1 (en) * 2019-09-12 2021-03-18 Beijing University Of Posts And Telecommunications Method and apparatus for task scheduling based on deep reinforcement learning, and device
CN112631764A (en) * 2019-09-24 2021-04-09 中兴通讯股份有限公司 Task scheduling method and device, computer equipment and computer readable medium
WO2021180092A1 (en) * 2020-03-11 2021-09-16 上海商汤智能科技有限公司 Task dispatching method and apparatus

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130191843A1 (en) * 2011-08-23 2013-07-25 Infosys Limited System and method for job scheduling optimization
US20180039520A1 (en) * 2015-03-24 2018-02-08 Telefonaktiebolaget Lm Ericsson (Publ) Methods and Nodes for Scheduling Data Processing
US20180067764A1 (en) * 2016-09-08 2018-03-08 International Business Machines Corporation Smart reduce task scheduler
CN106936892A (en) * 2017-01-09 2017-07-07 北京邮电大学 A kind of self-organizing cloud multi-to-multi computation migration method and system
CN107357652A (en) * 2017-06-16 2017-11-17 北京邮电大学 A kind of cloud computing method for scheduling task based on segmentation sequence and standard deviation Dynamic gene
US20200278889A1 (en) * 2019-02-28 2020-09-03 International Business Machines Corporation Task management using a virtual node
CN110209494A (en) * 2019-04-22 2019-09-06 西北大学 A kind of distributed task dispatching method and Hadoop cluster towards big data
CN110287009A (en) * 2019-05-28 2019-09-27 北京大米科技有限公司 A kind of working node selection method, device, storage medium and server
CN110489229A (en) * 2019-07-17 2019-11-22 长沙学院 A kind of multiple target method for scheduling task and system
CN110661844A (en) * 2019-08-16 2020-01-07 北京旷视科技有限公司 Automatic release scheduling system, method and storage medium
US20210081787A1 (en) * 2019-09-12 2021-03-18 Beijing University Of Posts And Telecommunications Method and apparatus for task scheduling based on deep reinforcement learning, and device
CN112631764A (en) * 2019-09-24 2021-04-09 中兴通讯股份有限公司 Task scheduling method and device, computer equipment and computer readable medium
WO2021180092A1 (en) * 2020-03-11 2021-09-16 上海商汤智能科技有限公司 Task dispatching method and apparatus
CN111651257A (en) * 2020-06-02 2020-09-11 广州道源信息科技有限公司 Task scheduling optimization method of intelligent household system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JYOTI V.GAUTAM等: "A Survey on Job Scheduling Algorithms in Big Data Processing", 《RESEARCHGATE》 *
周希武等: ""互联网+健康保险"模式下云平台与医院信息系统对接方案设计及应用", 《医学信息学杂志》 *
王艳: "云计算环境下差异性资源调度特征匹配算法", 《中国西部科技》 *

Also Published As

Publication number Publication date
CN114546623B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN103942099B (en) Executing tasks parallelly method and device based on Hive
JP2014525640A (en) Expansion of parallel processing development environment
Tong et al. A novel task scheduling scheme in a cloud computing environment using hybrid biogeography-based optimization
CN110889497B (en) Learning task compiling method of artificial intelligence processor and related product
Dong et al. Characterizing the microarchitectural implications of a convolutional neural network (cnn) execution on gpus
CN110866029A (en) sql statement construction method, device, server and readable storage medium
Sundas et al. An introduction of CloudSim simulation tool for modelling and scheduling
CN116011562A (en) Operator processing method, operator processing device, electronic device and readable storage medium
Zhao et al. Heuristic data placement for data-intensive applications in heterogeneous cloud
Cecilia et al. Enhancing GPU parallelism in nature-inspired algorithms
Guo et al. Automated exploration and implementation of distributed cnn inference at the edge
Galea et al. A parallel simulated annealing approach for the mapping of large process networks
Ratajczak-Ropel et al. Multi-mode resource-constrained project scheduling
Morgan et al. Planning for performance: persistent collective operations for MPI
Hu et al. Cluster-scheduling big graph traversal task for parallel processing in heterogeneous cloud based on DAG transformation
CN105335135B (en) Data processing method and central node
CN114546623B (en) Task scheduling method and system based on big data system
Altenbernd Timing analysis, scheduling, and allocation of periodic hard real-time tasks
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
Sui et al. Hybrid CPU–GPU constraint checking: Towards efficient context consistency
Khaitan et al. Parallelizing power system contingency analysis using D programming language
CN110377769A (en) Modeling Platform system, method, server and medium based on graph data structure
CN110515729A (en) Graph computing node vector load balancing method and device based on graph processor
Qian et al. {GAIA}: A System for Interactive Analysis on Distributed Graphs Using a {High-Level} Language
CN112418930B (en) Test method, system and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant