CN114546623B

CN114546623B - Task scheduling method and system based on big data system

Info

Publication number: CN114546623B
Application number: CN202210192697.1A
Authority: CN
Inventors: 周希武
Original assignee: Second Peoples Hospital of Huaian
Current assignee: Second Peoples Hospital of Huaian
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-12-27
Anticipated expiration: 2042-03-01
Also published as: CN114546623A

Abstract

The invention relates to a task scheduling method and a task scheduling system based on a big data system, wherein the method comprises the steps of establishing a task based on a task establishing request; calculating a task characteristic matrix, and putting the tasks into a task pool; acquiring an available computing node set; selecting a target task set from a task pool based on a set of available computing nodes; assigning tasks in a target set of tasks to the set of available compute nodes. The invention carries out task scheduling based on the matching of the set and the set, improves the flexibility of task scheduling, enables the task scheduling to achieve local optimization under the condition of real-time allowance, enables complex characteristic description to be possible under the condition of limited calculated amount, and greatly improves the task scheduling efficiency under the environment of a big data system.

Description

Task scheduling method and system based on big data system

[ technical field ] A

The invention belongs to the field of big data, and particularly relates to a task scheduling method and system based on a big data system.

[ background ] A method for producing a semiconductor device

With the technological progress, information development has advanced into the big data era, and data medical treatment is an important field of big data, and with the rapid development of medical information technology, more and more medical institutions in China are accelerating to implement the whole construction based on the informatization platform so as to improve the service level and the core competitiveness of hospitals. Big data applications outbreak as new medical changes are started. When rapidly growing diversified hospital data meets big data technology, medical big data application is rapidly welcomed by medical institutions. Basically, a large hospital has hundreds of systems operating on line, so that when receiving impact of big data in an information center of the hospital at present, the first problems to be faced are how to integrate information and how to combine characteristics of the big data of the hospital with a big data platform or a big data system, thereby efficiently utilizing advantages and efficiency of the big data; in fact, for medical information, there is a strong correlation between information, because there is a certain correlation between patients and patients, between different periods of patients themselves, and between doctors, but the correlation of information may not be fully understood and discovered in a semantic understanding manner, how to go from higher levels, for example: the method is characterized in that an operating system layer, a task scheduling layer and a possible future hardware platform building layer are used for utilizing the inherent relevance of the big data information, so that the overall operation efficiency of the big data is improved, and the method is a new research direction.

The traditional task scheduling system generally only supports the whole task to be executed according to the sequence of the directed acyclic graph, and whether a task node is triggered to be executed or not is conditioned to that all the upstream nodes of the task node are triggered to be executed, and the execution results are successful. The execution of the tasks is also performed in sequence, the data information contained in the big data and the current big data analysis capability are not fully utilized, valuable new information is obtained by mining, namely, the task execution logic in the related technology is not flexible enough. And the big data system can collect and analyze the preset information, and the preset information comprises the data of the basic application system. The big data system is mainly responsible for collecting and analyzing the data of the whole service platform, and mainly realizes acquisition and comprehensive analysis of various data of the basic application system. Therefore, how to optimize the conventional task scheduling system and how to utilize the characteristics of large data nodes and large task number and high flexibility to improve the usability of the system for developers and the task execution efficiency of the system for developers is urgent.

The invention carries out task scheduling based on the matching of the set and the set, improves the flexibility of task scheduling, and leads the task scheduling to achieve local optimization under the condition of real-time permission; a task characteristic matrix, a worst task matrix and an optimal task matrix are provided, and based on the representation, the matching between the computing nodes and the tasks is converted into the matching between the tasks and the tasks, so that the complex characteristic description becomes possible under the condition of limited computing amount, and the task scheduling efficiency is greatly improved; the available states of the computing nodes are transmitted based on the state variables, and the state variables are modified in an adding and subtracting mode, so that the scheduling nodes can accurately and timely construct an available computing node set when the opportunity arrives; and further, differentiated information transfer is provided for main bodies of all scheduling parties through differentiated placement of state nodes.

[ summary of the invention ]

In order to solve the above problems in the prior art, the present invention provides a task scheduling method and system based on a big data system, wherein the system comprises:

a task scheduling method based on a big data system, the method comprising:

step S1: creating a task based on the task creation request;

step S2: calculating a task characteristic matrix, and putting the tasks into a task pool;

and step S3: acquiring an available computing node set;

and step S4: selecting a target task set from a task pool based on the available compute node sets;

step S5: assigning tasks in a target set of tasks to the set of available compute nodes.

Further, the task features and the tasks are associated and placed into a task pool.

Further, the tasks include a data reading task, a database reading task, a data calculating task, a data analyzing task, a database data updating task and the like.

Further, the task creation request comes from one or more terminal devices.

Further, the task is a big data platform task.

A big data system based task scheduling system, comprising: the system comprises terminal equipment, a computing node and a scheduling node;

the terminal equipment is used for initiating a task creating request to the scheduling node;

the computing node is used for receiving task scheduling and executing the tasks;

the scheduling node includes:

a creation module for creating a task based on the task creation request;

the task characteristic matrix calculation module is used for calculating a task characteristic matrix and putting the tasks into a task pool;

the available computing node determining module is used for acquiring an available computing node set;

the target task set selection module is used for selecting a target task set from the task pool based on the available computing node set;

an assignment module to assign tasks in a target set of tasks to the set of available computing nodes.

A processor is used for running a program, wherein the program executes the task scheduling method based on the big data system during running.

An execution device comprising a processor coupled to a memory, the memory storing program instructions, which when executed by the processor, implement the big data system based task scheduling method.

A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the big data system based task scheduling method.

A cloud server is configured to execute the big data system-based task scheduling method.

The beneficial effects of the invention include:

(1) The task scheduling is carried out based on the matching of the set and the set, so that the flexibility of the task scheduling is improved, and the task scheduling can achieve local optimization under the condition of real-time allowance; meanwhile, matching between the computing nodes and the tasks is converted into matching between the tasks through the worst task matrix and the optimal task matrix, so that complex feature description becomes possible under the condition of limited computing amount, and task scheduling efficiency is greatly improved; (2) The available states of the computing nodes are transmitted based on the state variables, and the state variables are modified in an adding and subtracting mode, so that the scheduling nodes can accurately and timely construct an available computing node set when the opportunity arrives; and further, differentiated information transfer is provided for main bodies of all scheduling parties through differentiated placement of state nodes.

[ description of the drawings ]

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, are not intended to limit the invention, and:

fig. 1 is a schematic diagram of a task scheduling method based on a big data system according to the present invention.

[ detailed description ] A

The invention will be described in detail with reference to the drawings and specific embodiments, wherein the exemplary embodiments and the description are only for the purpose of illustrating the invention, and are not to be construed as limiting the invention

The big data system can collect and analyze data, mainly aims at acquiring and comprehensively analyzing various data of a basic application system, and needs to create tasks and reasonably schedule the acquisition, the analysis and the comprehensive of the data. For diversified tasks and computing nodes with states changing at any time in a big data system, how to schedule the tasks is crucial; the computing nodes are positioned in the same or different hardware devices and are connected with each other by adopting a bus, an internal circuit or a network connection; the scheduling node is used for creating a task based on a task creating request submitted by the terminal equipment and scheduling the task to a proper computing node for execution;

the invention provides a task scheduling method based on a big data system, which comprises the following steps:

step S1: creating a task based on the task creation request;

preferably, the following components: the tasks comprise a data reading task, a database reading task, a data calculating task, a data analyzing task, a database data updating task and the like;

preferably: the task is a big data platform task;

preferably: the task creation request comes from one or more terminal devices;

preferably: the scheduling node creates a task based on the task creation request; for example: the scheduling node is a task scheduling server;

step S2: calculating task characteristics, associating the task characteristics with the tasks and putting the task characteristics and the tasks into a task pool;

the computing task characteristics; the method comprises the following specific steps: acquiring various types of task features, wherein each type of task feature comprises one or more sub-features, and after the sub-features are normalized, representing each type of feature by using a vector; finally, combining a plurality of vectors corresponding to a plurality of types to form a matrix, and expressing the task characteristics by using a task characteristic matrix; when the number of the sub-features is different, parts with different corresponding vector lengths are filled with 0;

preferably, the following components: the task features comprise contour features, data features and execution features; respectively converting the task contour features, the data features and the execution features into task contour feature vectors, data feature vectors and execution feature vectors; for each type of feature, normalizing the numerical sub-feature to be used as an element in a vector, and converting the non-numerical sub-feature into the numerical feature to be used as an element in the vector; for example: summary features include, for example: task type, owner, real-time, number of temporary variables, and/or task execution time, etc.; the task type TypeA is converted into a value of 1, the TypeB is converted into a value of 2, and the rest is done in the same way; when the task type is 10 types, the vector element corresponding to the task type sub-feature after normalization is 0.1; by normalization, elements in all vectors belong to [0,1];

preferably: the task execution block is used for acquiring the outline characteristics of the task;

data characteristics are used to characterize the data involved in the execution of a task, such as: the number of static variables, the number of temporary variables, the length of a data file, the size of a data space of a task and/or the like;

the execution characteristics are used for representing the execution characteristics of the task in the execution process; for example: the characteristics of the task execution graph, the number of loops, the variable length of loop variables, the length of task execution paths, the number of functions and/or the like;

the task pool is used for managing tasks to be scheduled, the task pool can be managed in a queue mode, and the tasks in the task pool are sorted according to priority; the priority here is a scheduling priority, which can be calculated by various existing calculation methods;

preferably, the following components: setting the latest scheduling time for each task, wherein the latest scheduling time of the real-time task is related to the real-time characteristic of the real-time task, and when the task is not the real-time task and the latest scheduling time is not set, setting default length timer equipment for calculating the latest scheduling time; thus, each task can not be scheduled never because of local optimization based on the set;

and step S3: acquiring an available computing node set; specifically, the method comprises the following steps: the available information is transferred through the state variable value; when a computing node is available, the computing node modifies a corresponding state variable, a scheduling server reads the state variable so as to obtain the available condition of the computing node, and a current available computing node set is constructed based on the available condition;

when the computing node and the scheduling node communicate through internal lines such as a bus, exact and accurate information can be obtained; in a big data system, however, the communication between the computing node and the scheduling node is not accurate and definite, the invention transmits the available information based on the state variable value, so that even if the information of the computing node is not read correctly under a few conditions, the computing node only wastes a scheduling time, and because the state variable can be read and written independently between the computing node and the scheduling node, great flexibility is brought; on the basis, the communication overhead under the condition is reduced in a state variable mode, complicated handshake operation is cancelled, and the available state of the computing node is transmitted based on the state variable, so that the scheduling node can accurately and timely construct an available computing node set when the opportunity arrives;

preferably, the following components: setting a state variable as an integer variable; modifying the state variables in an adding and subtracting mode to express different available information; different computing nodes modify the state variables by adding or subtracting fixed values to express the change condition of the available information of the different computing nodes; the fixed values corresponding to different computing nodes are different; for example: for the 32-bit integer variable, the change conditions of 32 computing nodes can be identified, the fixed numerical value corresponding to the first computing node is 1, the fixed numerical value corresponding to the second computing node is 10, and so on, the state of the first computing node is modified in a mode of adding or subtracting 1; if the current state variable is 00000111 and 1 is used to identify available, then the state variable identifies that the first, second, and third compute nodes are available and the other compute nodes are not available; by the mode, the state transmission overhead is minimized, and the increase of the number of the state variables is very limited along with the increase of the number of the computing nodes needing scheduling management;

correspondingly, differential support can be provided for different computing nodes and task scheduling through differential placement of state nodes;

preferably: the state variables are one or more; when the number of the state variables is multiple, the state variables are placed in different storage positions of the same node or different nodes;

the state variables are placed in different nodes, specifically: a state variable corresponds to an available state representation of one or more compute nodes; the first computing node set corresponds to a first state variable; the second computing node set corresponds to a second state variable; one or more computing nodes are arranged in the first computing node set and the second computing node set, and the computing nodes are targets of task scheduling; the first state variable is placed in the first node, and the second state variable is placed in the second node; the first node and the second node are different physical nodes; analogizing in turn, correspondingly setting more state variables aiming at more differentiation requirements, namely aiming at more computing node sets, and setting the state variables at the same or different nodes; the nodes are big data system nodes and comprise various computer entities such as computing nodes, scheduling nodes, security nodes, computing cores, processors and the like;

one more specific way is: respectively placing the state variables in the scheduling nodes or different nodes; the nodes can be nodes providing higher security levels or different security levels, or for important computing nodes, corresponding state variables are directly placed on the important computing nodes, so that after the scheduling nodes obtain access rights of a specific area, information is actively read from the important computing nodes, and other work and work types of the important computing nodes are reduced; wherein: the specific area comprises an area where the state variable is located;

the method for placing the state variables in different storage positions of the same node specifically comprises the following steps: a state variable corresponds to a state representation of one or more compute nodes; placing a first state variable corresponding to the first computing node set at a first position of the first node, and placing a second state variable corresponding to the second computing node set at a second position of the first node; the first location and the second location are different storage locations of the first node; the different memory locations are located on different memory cells; the different memory cells include: caches, high-speed caches, registers, memories, different banks of the memories, encrypted memory units, unencrypted memory units and/or extended memory units and the like;

one more specific way is: setting both the first state variable and the second state variable on scheduling nodes, the computing nodes to which scheduling is directed being different computing units on the same computing device,

preferably: the different computing units are different processors and different computing cores on the same computing device; the different computing units are connected on the basis of a bus; the state variables corresponding to different computing cores are placed in a cache, and the state variables corresponding to different processors are placed in a memory;

and step S4: selecting a target task set from a task pool based on the available compute node sets; the method specifically comprises the following steps: calculating node set characteristics of an available computing node set; selecting a target task from the task set to form a target task set, and enabling the task set characteristics of the target task set to be the highest in matching degree with the node set characteristics;

the step S4 specifically includes the following steps:

step S41: for each computing node in the available computing node set, acquiring an optimal task matrix and a worst task matrix corresponding to the computing node;

one specific acquisition mode is as follows: performing various task executions on the computing nodes, recording the execution efficiency of the tasks, determining the tasks with the highest execution efficiency and the tasks with the lowest execution efficiency, and extracting the feature matrixes corresponding to the tasks with the highest execution efficiency and the tasks with the lowest execution efficiency to obtain an optimal task matrix and a worst task matrix; the calculation node identification and the corresponding optimal task matrix and worst task matrix are stored in an associated mode;

the execution efficiency can be one or more combination of multiple indexes such as execution time, resource interest rate and the like;

the optimal task matrix of the computing node is a task characteristic matrix corresponding to a task with the highest execution efficiency for the computing node; the worst task matrix of the computing node is a task characteristic matrix corresponding to a task with the lowest execution efficiency for the computing node; then, because the configuration of each computing node is the same or different, the corresponding optimal task matrix and worst task matrix for each computing node are the same or different;

in the prior art, the characteristics of a computing node are described, the characteristics of storage, calculation and other aspects of the computing node are described, and the matching degree is determined by matching between the computing node characteristics and task characteristics, but actually, with the increase of the number of characteristics which can be used for description, the description of the matching degree between the node characteristics and the task characteristics is difficult, and the measurement of the matching degree is carried out by means of artificial intelligence means such as a neural network, but with the increase of the description characteristics, the calculated amount is explosively increased;

step S42: selecting a target task from the task set to form a target task set; the method comprises the following specific steps: when the step is entered for the first time, selecting a target number of tasks from the task pool as target tasks according to the sequence of the priorities from high to low, wherein: the target number is equal to the number of nodes in the set of available computing nodes; when the step is entered again, a first number of target tasks are selected from the task pool again and put into the target task set, and the first number of target tasks are replaced from the target task set and put back into the task pool;

preferably: selecting a first number of target tasks from the task pool according to the sequence of the priority from high to low to put the target tasks into the target task set, and changing the first number of target tasks from the target task set to put the target tasks into the task pool; that is, a first number of tasks are replaced at a time;

preferably, the following components: the first number is equal to 1;

the first number of target tasks are swapped out from the target task set; the method comprises the following specific steps: selecting a first number of target tasks with lowest priority from the target task set to be swapped out;

preferably, the following components: when the step is entered again, firstly determining whether a target task which can be replaced exists, if so, continuing to execute, otherwise, not replacing the target task set, and entering the step S45; when the latest scheduling time of each task in the target task set is reached, no task which can be replaced exists any more; similarly, when nodes are switched in, tasks which have arrived at the latest scheduling time are necessarily selected or switched in a target task set in a limited way due to higher priority;

preferably: if the number of target tasks that can be swapped out is less than the first number, resetting the first number to be equal to the number of target tasks that can be swapped out, and resetting the first number value when entering this step the next time;

the method comprises the following steps of reselecting a first number of target tasks from a task pool to be placed into a target task set, and replacing the first number of target tasks from the target task set to be placed back into the task pool, and specifically comprises the following steps: selecting a first number of tasks as target tasks to be put into a target task set according to the priority level in the tasks which are not selected in the task pool for subsequent matching degree calculation; at the same time, a first number of target tasks need to be swapped out, i.e. deleted from the set; one mode of swapping out selection is to select a first number of target tasks with the lowest priority from a target task set to swap out, and the swapped out target tasks are put into a scheduling pool and are no longer the target tasks; when the next scheduling opportunity comes, the target task can be selected again;

step S43: calculating the characteristic matching degree of the target task set and the available computing node set; when the feature matching degree is higher than the preset matching degree, the step S44 is executed; otherwise, go to step S42;

the feature matching degree of the calculation target task set and the available calculation node set is obtained; the method specifically comprises the following steps:

step S43A1: constructing a current scheduling mode SCH _ CNT, and entering step S43A2;

the method for constructing the current scheduling mode SCH _ CNT specifically comprises the following steps: for each task in the target task set, sequentially calculating the distance between the task feature matrix of the task and the optimal feature matrix of each node according to the sequence of the priority of the task from large to small, and scheduling the task to the calculation node corresponding to the optimal feature matrix with the minimum distance; taking the scheduling result as a current scheduling mode SCH _ CNT;

step S43A2: calculating a characteristic matching degree MH under a current scheduling mode;

MH _i ＝||T _k -MXB _i ||-||T _k -MXG _i ||

wherein: MH _i The feature matching degree of the ith computing node in the current scheduling mode; MH is the characteristic matching degree of the target task set and the available computing node set in the current scheduling mode; n is the target task number; MXG _i Is the optimal characteristic matrix of the ith calculation node; MXB _i Is the worst feature matrix of the ith calculation node; tk is a task characteristic matrix of the task k in the current scheduling mode; the | | | | represents the modulus of the matrix; at this time, the k task is distributed on the i computing node;

step S43A3: outputting the current scheduling mode and the corresponding feature matching degree;

through the process of local optimization inside the set, the scheduling strategy is optimized on the basis of guaranteeing priority-based scheduling;

step S44: taking the target task set as a selected target task set and outputting the target task set; specifically, the method comprises the following steps: recording the selected target task set and the corresponding characteristic matching degree thereof in the execution process of the steps, and taking the target task set with the highest characteristic matching degree as the selected target task set and outputting the target task set;

preferably, the following components: outputting a target task set and a corresponding scheduling mode in a correlated manner;

the scheduling method is to schedule a task to a computing node, and certainly, when the size of the task is large, allocating an independent task to a computing unit can cause that the computing unit cannot respond and the task cannot be completed in time;

step S5: assigning tasks in the target task set to a set of available compute nodes; the method comprises the following specific steps: assigning the tasks in the target task set to the available computing node set according to the current scheduling mode;

of course, in the case that the target task set and the available computing nodes are determined at this time, various alternative scheduling manners may be adopted, for example: calculating the degree of fit between each task and each node, and scheduling and assigning according to the degree of fit, which can be scheduled by referring to a single index, but is a backward scheduling idea because the feature matrix has taken various indexes into consideration;

based on the same inventive concept, the invention provides a task scheduling system based on a big data system, which comprises: the system comprises terminal equipment, a computing node and a scheduling node;

the computing node is used for receiving task scheduling and executing tasks;

the scheduling node includes:

a creation module for creating a task based on the task creation request;

the target task set selection module is used for selecting a target task set from the task pool based on the available computing node sets;

Preferably, the following components: the scheduling node is a cloud server, the cloud server is built on a cloud platform, and services provided by a core service layer of the cloud computing platform supported by the cloud server include infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). Wherein IaaS provides real or virtual computing, storage and network services, paaS provides application deployment and management services, and SaaS provides application programs. The enterprise to which the business terminal belongs can rent information exchange processing.

The terms "compute node," "dispatch node," "terminal device," or "computing device" encompass all kinds of apparatuses, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or a plurality or combination of the foregoing. The apparatus can comprise special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform execution environment, a virtual machine, or a combination of one or more of the foregoing. The apparatus and execution environment may implement a variety of different computing model infrastructures, such as web services, distributed computing, and grid computing infrastructures.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. The computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subroutines, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A task scheduling method based on a big data system is characterized by comprising the following steps:

step S1: creating a task based on the task creation request;

the computing task characteristics; the method comprises the following specific steps: acquiring various types of task features, wherein each type of task feature comprises one or more sub-features, and after the sub-features are normalized, each type of feature is expressed by a vector; finally, combining a plurality of vectors corresponding to a plurality of types to form a matrix, and expressing the task characteristics by using a task characteristic matrix;

and step S3: acquiring an available computing node set; specifically, the method comprises the following steps: passing available information through the state variable values; when a computing node is available, the computing node modifies a corresponding state variable, a scheduling server reads the state variable so as to obtain the available condition of the computing node, and a current available computing node set is constructed based on the available condition;

and step S4: selecting a target task set from a task pool based on a set of available computing nodes; the method specifically comprises the following steps: calculating the node set characteristics of the available computing node set; selecting a target task from the task set to form a target task set, and enabling the task set characteristics of the target task set to be the highest in matching degree with the node set characteristics;

the step S4 specifically includes the following steps:

step S41: for each computing node in the available computing node set, acquiring an optimal task matrix and a worst task matrix corresponding to the computing node; the optimal task matrix of the computing node is a task characteristic matrix corresponding to a task with the highest execution efficiency for the computing node; the worst task matrix of the computing node is a task characteristic matrix corresponding to a task with the lowest execution efficiency for the computing node;

step S42: selecting a target task from the task set to form a target task set; the method comprises the following specific steps: when the step is entered for the first time, selecting a target number of tasks from the task pool as target tasks according to the sequence of the priority from high to low, wherein: the target number is equal to the number of nodes in the available set of computing nodes; when the step is entered again, a first number of target tasks are selected from the task pool again and put into the target task set, and the first number of target tasks are replaced from the target task set and put back into the task pool;

2. The big data system-based task scheduling method of claim 1, wherein task features and tasks are associated and placed into a task pool.

3. The big data system-based task scheduling method according to claim 2, wherein the tasks are a data reading task, a database reading task, a data calculating task, a data analyzing task, and a database data updating task.

4. The big data system-based task scheduling method according to claim 3, wherein the task creation request is from one or more terminal devices.

5. The big data system-based task scheduling method according to claim 4, wherein the task is a big data platform task.

6. A big data system-based task scheduling system, comprising: the system comprises terminal equipment, a computing node and a scheduling node;

the computing node is used for receiving task scheduling and executing tasks;

the scheduling node includes:

a creation module for creating a task based on the task creation request;

the task characteristic matrix calculation module is used for calculating task characteristics, correlating the task characteristics with the tasks and putting the task characteristics and the tasks into a task pool;

the computing task characteristics; the method specifically comprises the following steps: acquiring various types of task features, wherein each type of task feature comprises one or more sub-features, and after the sub-features are normalized, each type of feature is expressed by a vector; finally, combining a plurality of vectors corresponding to a plurality of types to form a matrix, and expressing the task characteristics by using a task characteristic matrix;

the available computing node determining module is used for acquiring an available computing node set; specifically, the method comprises the following steps: passing available information through the state variable values; when a computing node is available, the computing node modifies a corresponding state variable, a scheduling server reads the state variable so as to obtain the available condition of the computing node, and a current available computing node set is constructed based on the available condition;

the target task set selection module is used for selecting a target task set from the task pool based on the available computing node set; the method specifically comprises the following steps: calculating the node set characteristics of the available computing node set; selecting a target task from the task set to form a target task set, and enabling the task set characteristics of the target task set to be matched with the node set characteristics to the highest degree;

the target task set selection module specifically executes the following steps:

step S44: taking the target task set as a selected target task set and outputting the target task set; specifically, the method comprises the following steps: recording the selected target task set and the corresponding feature matching degree thereof in the execution process of the steps, and taking the target task set with the highest feature matching degree as the selected target task set and outputting the target task set;

an assignment module to assign a task of a target set of tasks to the set of available computing nodes.

7. A processor, wherein the processor is configured to execute a program, wherein the program executes the big data system based task scheduling method according to any one of claims 1 to 5.

8. An execution device comprising a processor coupled to a memory, the memory storing program instructions, the program instructions stored by the memory when executed by the processor implementing the big data system based task scheduling method of any of claims 1-5.

9. A computer-readable storage medium, characterized by comprising a program which, when run on a computer, causes the computer to execute the big data system based task scheduling method according to any of claims 1-5.

10. A cloud server, characterized in that the cloud server is configured to execute the big data system based task scheduling method according to any one of claims 1-5.