CN111506402A - Computer task scheduling method, device, equipment and medium for machine learning modeling - Google Patents

Computer task scheduling method, device, equipment and medium for machine learning modeling Download PDF

Info

Publication number
CN111506402A
CN111506402A CN202010242943.0A CN202010242943A CN111506402A CN 111506402 A CN111506402 A CN 111506402A CN 202010242943 A CN202010242943 A CN 202010242943A CN 111506402 A CN111506402 A CN 111506402A
Authority
CN
China
Prior art keywords
task
module
calculation
value
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010242943.0A
Other languages
Chinese (zh)
Other versions
CN111506402B (en
Inventor
胡宸章
朱明杰
魏岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Creditx Information Technology Co ltd
Original Assignee
Shanghai Creditx Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Creditx Information Technology Co ltd filed Critical Shanghai Creditx Information Technology Co ltd
Priority to CN202010242943.0A priority Critical patent/CN111506402B/en
Publication of CN111506402A publication Critical patent/CN111506402A/en
Application granted granted Critical
Publication of CN111506402B publication Critical patent/CN111506402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a computer task scheduling method, a device, equipment and a medium for machine learning modeling, wherein the method comprises the following steps: s1, establishing a task module, wherein the task module comprises a require function; s2, defining a calculation task to be executed as a task A, defining a calculation task on which the task A depends as a task B, and executing the calculation task by calling a run function in a calculation module; s3, the task A introduces a task module; s4, the task A introduces a calculation module where the task B is located; s5, the task A calls a require function in the task module, and a calculation module where the task B is located is used as a parameter to be transmitted; and S6, the require function calls the calculation result of the task B and returns the calculation result to the task A. The invention automatically acquires the dependency relationship of the calculation tasks from the program codes, records and tracks the change of the program codes, the data files and the dependency relationship, determines whether each calculation task needs to be executed or not, and overcomes the defects of the conventional calculation task scheduling mode.

Description

Computer task scheduling method, device, equipment and medium for machine learning modeling
Technical Field
The invention belongs to the technical field of computer data processing, and particularly relates to a computer task scheduling method, device, equipment and medium for machine learning modeling.
Background
Machine learning modeling class items, typically involve multiple computational tasks. In each computing task, the computer needs to execute a program corresponding to the computing task. The computing tasks have a dependency relationship, and after the execution of the depended computing tasks is completed, the computing tasks depending on the depended computing tasks can be executed. (for example, if computing task C depends on computing task A, B, C must be executed after A, B execution is completed.) the program corresponding to each computing task, and the dependency between computing tasks, can be changed as the project progresses. The changed computing task needs to be re-executed. If the dependent computing task is re-executed, the computing task that relied on it should also be re-executed.
The existing scheduling mode is mainly manual scheduling, that is, a computer is operated by a person to execute a corresponding program according to a calculation task required to be executed. When the related calculation tasks are more and the dependency relationship is more complex, the phenomena of repeated execution and execution omission are easy to occur.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a computer task scheduling method facing machine learning modeling, which adopts a computer program to automatically manage the process of executing a computing task and can overcome the defect that manual scheduling in the prior art is difficult to deal with more tasks and complex dependency relationship. The invention also provides a computer task scheduling device, equipment and medium for machine learning modeling.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a computer task scheduling method facing machine learning modeling, which comprises the following steps:
s1, establishing a task module, wherein the task module comprises a require function, and the require function is used for calculating the scheduling of tasks;
s2, defining a calculation task to be executed as a task A, defining a calculation task on which the task A depends as a task B, programming each calculation task in a calculation module, and executing the calculation task by calling a run function in the calculation module;
s3, the task A introduces a task module;
s4, the task A introduces a calculation module where the task B is located;
s5, the task A calls a require function in the task module, and a calculation module where the task B is located is used as a parameter to be transmitted;
and S6, the require function calls the calculation result of the task B and returns the calculation result to the task A.
The run function of each calculation task has optional parameters, different parameters are called for the same run function, the run function is considered to be different calculation tasks, and the parameters and the return values of the run function are serializable.
Each computing task corresponds to a unique ID and a unique cache file, and the cache files are stored in a cache module.
Each calculation task obtains the path of the run function, serializes all optional parameters, calculates the hash value of the parameters by using the SHA-1 algorithm, and connects the path with the hash value of the parameters to obtain the unique ID of the calculation task.
In step S6, the require function obtains the return value of the run function in the computation module of task b.
In step S6, the process of determining the require function includes the following steps:
s61, acquiring the path of a run function in a calculation module of the task B, serializing all selectable parameters, calculating hash values of the parameters by using an SHA-1 algorithm, and connecting the two to obtain the ID of the task B;
s62, if the dependent abstract value of the calculation module is not calculated in the operation, using a Tarjan Hash algorithm to obtain the dependent abstract value of the calculation task; if the calculation module calculates the dependency summary value in the current operation, the step S63 is reached directly;
s63, searching a cache file corresponding to the calculation task in the state cache module by using the unique ID, if the cache file does not exist, creating the cache file and locking, marking that the task B needs to be executed, and going to the step S64; if the cache file exists, opening and locking the cache file, reading the dependent abstract value from the cache file, and comparing the dependent abstract value in the cache file with the dependent abstract value in the current operation; if the dependent abstract values are different, the marking task B needs to be executed, and the step S64 is reached; if the dependent abstract values are the same, marking the task B without executing, and reaching the step S65;
s64, calling a run function of the task B, writing the dependent abstract value of the current operation into a cache file, and writing the return value of the run function into the cache file of the task B;
and S65, unlocking the cache file and returning the return value.
The calculation process of the dependent abstract value comprises the following steps:
step a, establishing two stack data structures, namely S _ search and S _ member;
step b, respectively pressing the computing module of the task B into an S _ search and an S _ member;
step c, inspecting a module m on the top of the S _ search stack; if the module m is not accessed, the step d is reached; if the module m is accessed, the step e is reached;
step d, the marking module m is accessed; marking the earliest accessed module low (m) which can be reached by the module m, wherein the low (m) is the module m per se; traversing all the modules m 'which are not accessed and depended by the module m, pushing m' into S _ search and S _ number, and returning to the step c;
e, popping up a module m from the S _ search, and establishing a string h _ deps for the module m; calculating the hash value of the program code file of the module m by using an SHA-1 algorithm, and adding the hash value to h _ deps of the module m; traversing the data file f of the module m, calculating the hash value of the data f by using an SHA-1 algorithm, and adding the hash value to h _ deps of the module m; traversing all modules m ' of the module m, introducing the S _ search, and if the module m ' generates a hash value h _ final, adding the h _ final of the module m ' to h _ deps of m; if the module m ' does not generate the hash value, updating the module low (m) which can be reached by the module m and is accessed earliest according to the module low (m ') which can be reached by the module m ' and accessed earliest; if low (m) is not m itself, returning to step c; if low (m) is m itself, go to step f;
step f, establishing a sequence L, traversing all modules m 'starting from m in the S _ member, adding h _ deps of m' to L, sequencing h _ deps strings in L, calculating a hash value h of all strings in L by using an SHA-1 algorithm, traversing all modules m 'starting from module m in the S _ member, making h _ final of m' be h, and ejecting modules from the S _ member until ejecting modules m;
and step g, if the S _ search is not empty, returning to the step c, otherwise, ending the algorithm, recording h _ final values of all modules as dependent abstract values of the modules, and returning to h _ final values of calculation modules of the task B.
In another aspect, the present invention provides a computer task scheduling device for machine learning modeling, including:
each computing task is programmed in one computing module, a run function in the computing module is taken as an inlet, and the run function is called to execute the computing task;
the state caching module consists of a plurality of caching files, and each computing task corresponds to one caching file;
and the task module comprises a require function which is used for scheduling the calculation tasks of the calculation module.
In another aspect, the present invention provides an electronic device comprising a processor, a storage medium, and a computer program stored in the storage medium, which when executed by the processor implements the computer task scheduling method for machine learning modeling according to any of claims 1 to 7.
In another aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the computer task scheduling method for machine-learning-modeling according to any of claims 1 to 7.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a programming interface which is easy to understand and use for the calculation tasks, automatically acquires the dependency relationship of the calculation tasks from the program codes, records and tracks the change of the program codes, the data files and the dependency relationship, determines whether each calculation task needs to be executed according to the dependency relationship, obtains the execution plan, executes each calculation task according to the execution plan, and overcomes the defect of the prior calculation task scheduling mode. Compared with the prior art that the task scheduling method focuses on scheduling priority and other problems, the task scheduling method is mainly used for solving the task scheduling related to various changing situations.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a process of acquiring a return value of a task b when the task a is executed or re-executed in the present invention.
FIG. 2 is a process of task A obtaining a return value of task B without re-executing task B in the present invention.
FIG. 3 is a flow chart of a calculation method of module A in the present invention.
FIG. 4 is a flowchart illustrating the manner in which task B is executed according to the present invention.
FIG. 5 is a flow chart of the require function of the present invention.
FIG. 6 is a flow chart of the Tarjan Hash algorithm in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
The embodiment provides a computer task scheduling method facing machine learning modeling, and aims to overcome the defect that manual scheduling is difficult to deal with more calculation tasks and complex dependency relationship. The implementation is implemented in the Python programming language, and the same method can also be used in some other programming languages.
The calculation required by one machine learning modeling class project is generally split into a plurality of calculation tasks, the calculation tasks have dependency relations, and the calculation tasks depending on the calculation tasks can be executed after the execution of the dependent calculation tasks is completed. (for example, if computing task C depends on computing task A, B, C must be executed after A, B execution is completed.) the program corresponding to each computing task, and the dependency between computing tasks, can be changed as the project progresses. The changed computing task needs to be re-executed. If the dependent computing task is re-executed, the computing task that relied on it should also be re-executed.
The embodiment comprises the following steps:
s1, establishing a task module, wherein the task module comprises a require function, and the require function is used for calculating the scheduling of the task. And establishing a state cache module, wherein the state cache module comprises a plurality of cache files and stores the cache files in a file system. Each computing task corresponds to one cache file, each computing task serializes all optional parameters by acquiring a path of a run function, calculates hash values of the parameters by using an SHA-1 algorithm, connects the path with the hash values of the parameters to obtain a unique ID of the computing task, and can find out the uniquely corresponding state cache file through the unique ID.
S2, defining a calculation task needing to be executed as a task A, defining a calculation task depended by the task A as a task B, writing each calculation task in a calculation module, and calling a run function in the module as an entrance to execute the calculation task. The run function has optional parameters, and if different parameters are called for the same run function, different calculation tasks are considered. The result of the calculation task is returned by the return value of the run function. The parameter and the return value of the run function should be serializable, and a value should be serializable, meaning that the value can be converted into a byte string composed of several bytes, one for one with the original value, and the converted byte string can be converted back to the original value.
And S3, the task A introduces a task module.
And S4, the task A introduces a calculation module where the task B is located.
S5, the task A calls a require function in the task module, and the calculation module where the task B is located is used as a parameter to be transmitted.
And S6, calling the task B by a require function, judging whether the task B needs to be executed or re-executed by the require function, obtaining a return value of the task irrun function and returning the return value to the task A.
If the task B needs to be executed:
and the requiree function calls a run function of the task B and writes a return value into the state cache file, and calls the return value of the task B from the state cache and returns the return value to the task A.
Alternatively, the require function accepts several serializable parameters from task A and passes the parameters to the run function of task B. If different parameters are transmitted during multiple calls, the require function respectively judges whether the task B needs to be executed and respectively writes return values into different state cache files. The computing task may declare the data file it depends on, and if the content of the data file changes, the computing task needs to be re-executed. A computing task is executed if the computing task satisfies any one of the following conditions:
1. the computing task is not executed, namely a run function return value of the computing task is not stored in a cache;
2. the program code file corresponding to the calculation task is changed, namely different from the last execution;
3. the data file corresponding to the computing task is changed, namely the change is different from the last execution;
4. the dependency relationship of the calculation task is changed (if the current item is established, the second item is necessarily established, so that the current item does not need to be considered separately);
5. recursively, the computing task on which the computing task depends satisfies any of the above conditions.
In this embodiment, the user calls the require function, and the module in which the computation task (usually, the computation task for generating the final result, or the computation task that has been recently developed) that needs to obtain the result is located is imported, and the require function executes the computation task as needed and returns the final return value.
The parameters of the require function are the module of the run function of the computation task to be executed, and a plurality of optional parameters for transmitting the run function and distinguishing the computation task. It will perform the computation task as needed, put the return value into the state cache, and return the return value that will eventually be obtained.
The judgment process of the require function comprises the following steps:
s61, acquiring the path of a run function in a calculation module of the task B, serializing all selectable parameters, calculating hash values of the parameters by using an SHA-1 algorithm, and connecting the two to obtain the ID of the task B;
s62, if the dependent abstract value of the calculation module is not calculated in the operation, using a Tarjan Hash algorithm to obtain the dependent abstract value of the calculation task; if the calculation module calculates the dependency summary value in the current operation, the step S63 is reached directly;
s63, searching a cache file corresponding to the calculation task in the state cache module by using the unique ID, if the cache file does not exist, creating the cache file and locking, marking that the task B needs to be executed, and going to the step S64; if the cache file exists, opening and locking the cache file, reading the dependent abstract value from the cache file, and comparing the dependent abstract value in the cache file with the dependent abstract value in the current operation; if the dependent abstract values are different, the marking task B needs to be executed, and the step S64 is reached; if the dependent abstract values are the same, marking the task B without executing, and reaching the step S65;
s64, calling a run function of the task B, writing the dependent abstract value of the current operation into a cache file, and writing the return value of the run function into the cache file of the task B;
and S65, unlocking the cache file and returning the return value.
The embodiment uses an algorithm to generate a hash value of all program codes and data files on which the computing task depends as the dependent digest value. The dependency digest value is used to determine whether each dependency of the computing task has changed. That is, if the dependent digest value changes, the calculation task needs to be re-executed.
Based on the constraints of the programming interface, if one computing task (task a) depends on another computing task (task b), task a must introduce the module in which task b resides.
Changes associated with task B change the dependent digest value of task B. Through the dependency relationship among the modules, the change is transmitted to the task A, and the dependency abstract value of the task A is changed.
The Tarjan Hash algorithm is derived based on the Tarjan SCC algorithm (literature: Tarjan, RE, Depth-first search and linear graphics algorithms, SIAM Journal on Computing,1972,1(2): 146-.
The calculation process of the dependent summary value comprises the following steps:
step a, establishing two stack data structures, namely S _ search and S _ member;
step b, respectively pressing the computing module of the task B into an S _ search and an S _ member;
step c, inspecting a module m on the top of the S _ search stack; if the module m is not accessed, the step d is reached; if the module m is accessed, the step e is reached;
step d, the marking module m is accessed; marking the earliest accessed module low (m) which can be reached by the module m, wherein the low (m) is the module m per se; traversing all the modules m 'which are not accessed and depended by the module m, pushing m' into S _ search and S _ number, and returning to the step c;
e, popping up a module m from the S _ search, and establishing a string h _ deps for the module m; calculating the hash value of the program code file of the module m by using an SHA-1 algorithm, and adding the hash value to h _ deps of the module m; traversing the data file f of the module m, calculating the hash value of the data f by using an SHA-1 algorithm, and adding the hash value to h _ deps of the module m; traversing all modules m ' of the module m, introducing the S _ search, and if the module m ' generates a hash value h _ final, adding the h _ final of the module m ' to h _ deps of m; if the module m ' does not generate the hash value, updating the module low (m) which can be reached by the module m and is accessed earliest according to the module low (m ') which can be reached by the module m ' and accessed earliest; if low (m) is not m itself, returning to step c; if low (m) is m itself, go to step f;
step f, establishing a sequence L, traversing all modules m 'starting from m in the S _ member, adding h _ deps of m' to L, sequencing h _ deps strings in L, calculating a hash value h of all strings in L by using an SHA-1 algorithm, traversing all modules m 'starting from module m in the S _ member, making h _ final of m' be h, and ejecting modules from the S _ member until ejecting modules m;
and step g, if the S _ search is not empty, returning to the step c, otherwise, ending the algorithm, recording h _ final values of all modules as dependent abstract values of the modules, and returning to h _ final values of calculation modules of the task B.
The following further explains the Tarjan Hash algorithm by combining an example, the objective is to calculate the dependent digest value of task B, and the steps according to the Tarjan Hash algorithm are as follows:
step a, establishing two stack data structures S _ search and S _ member;
step b, pressing a calculation module (marked as a module B) of the task B into an S _ search and an S _ member;
step c, the stack top module in the S _ search is a module B, and the module B is not accessed, and the step d is reached;
step d, marking that the module B is accessed, the low (B) is the module B, and if the module which the module B depends on is the module C and the module C is not accessed, pressing the module C into S _ search and S _ member, wherein the module C is positioned at the top of the stack; returning to the step c;
step c, the module C on the stack top in the S _ search is accessed, and the step d is reached;
step d, marking that the module C is accessed, the low (C) is the module C, supposing that the module C depends on is the module D and the module D is not accessed, pressing the module D into S _ search and S _ member, and positioning the module D at the stack top; returning to the step c;
step c, the stack top module in the S _ search is a module D, and the module D is not accessed, and the step d is reached;
step d, marking that the module D is accessed, low (D) is the module D, assuming that the module which the module D depends on is the module B, the module B is accessed and positioned in the stack, no new module is introduced to S _ search and S _ member at the moment, and the stack top module is the module D; returning to the step c;
step c, the stack top module in the S _ search is a module D, and the module D is accessed, and the step e is reached;
e, popping a module D from the S _ search, establishing a string of h _ deps for the module D, wherein the module D depends on is a module B, the module B does not generate a dependency summary value, low (B) is the module B, the low (D) is updated to be the module B, and the low (D) is not the module D, and returning to the step c;
step c, the stack top module in the S _ search is module C, and the module C is accessed, and the step e is reached;
e, popping the module C from the S _ search, establishing a series of h _ deps for the module C, wherein the module C depends on is a module D, the module D does not generate a dependency summary value, the low (D) is a module B, the low (C) is updated to be the module B, and the low (C) is not the module C, and returning to the step c;
step c, the stack top module in the S _ search is a module B, and the module B is accessed, and the step e is reached;
e, popping the module B from the S _ search, establishing a string of h _ deps for the module B, wherein the module on which the module B depends is a module C, a low (C) is the module B, a low (B) is the module B, and a low (B) is the module B per se, and going to step f;
and f, traversing all modules starting from the module B in the S _ member, namely all modules starting from the stack top of the S _ member to the module B, sequentially arranging the module T, the module C and the module B from the stack top in the S _ member, sequencing h _ deps strings of the module T, the module C and the module B, calculating the hash values of all the strings, and sequentially popping up the module T, the module C and the module B from the stack top of the S _ member, so that the dependent abstract values of the module T, the module C and the module B are the hash values obtained by the calculation.
And g, if the S _ search is empty, ending the algorithm.
The embodiment further provides a computer task scheduling device facing machine learning modeling, including:
each computing task is programmed in one computing module, a run function in the computing module is taken as an inlet, and the run function is called to execute the computing task;
the state caching module consists of a plurality of caching files, and each computing task corresponds to one caching file;
and the task module comprises a require function which is used for scheduling the calculation tasks of the calculation module.
The present embodiment also provides an electronic device, which includes a processor, a storage medium, and a computer program, where the computer program is stored in the storage medium, and when the computer program is executed by the processor, the method for scheduling a computer task facing machine learning modeling according to the first aspect is implemented.
In another aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the computer task scheduling method for machine learning modeling according to the first aspect.
Although the present invention has been described in detail with respect to the above embodiments, it will be understood by those skilled in the art that modifications or improvements based on the disclosure of the present invention may be made without departing from the spirit and scope of the invention, and these modifications and improvements are within the spirit and scope of the invention.

Claims (10)

1. A computer task scheduling method for machine learning modeling is characterized by comprising the following steps:
s1, establishing a task module, wherein the task module comprises a require function, and the require function is used for calculating the scheduling of tasks;
s2, defining a calculation task to be executed as a task A, defining a calculation task on which the task A depends as a task B, programming each calculation task in a calculation module, and executing the calculation task by calling a run function in the calculation module;
s3, the task A introduces a task module;
s4, the task A introduces a calculation module where the task B is located;
s5, the task A calls a require function in the task module, and a calculation module where the task B is located is used as a parameter to be transmitted;
and S6, the require function calls the calculation result of the task B and returns the calculation result to the task A.
2. The method of claim 1, wherein each run function of the computing task has optional parameters, and different parameters are called for the same run function, so that the different computing tasks are considered, and the parameters and return values of the run function are serializable.
3. The computer task scheduling method oriented to machine learning modeling according to claim 1, wherein each computing task corresponds to a unique ID and a unique cache file, and the cache files are stored in the cache module.
4. The computer task scheduling method oriented to machine learning modeling of claim 3, wherein each computation task obtains the unique ID of the computation task by obtaining the path of the run function itself, serializing all optional parameters and computing the hash value of the parameters using the SHA-1 algorithm, and connecting the path and the hash value of the parameters.
5. The computer task scheduling method oriented to machine learning modeling of claim 1, wherein in step S6, the require function obtains the return value of the run function in the computation module of task b.
6. The computer task scheduling method oriented to machine learning modeling according to claim 5, wherein in step S6, the process of determining the require function includes the following steps:
s61, acquiring the path of a run function in a calculation module of the task B, serializing all selectable parameters, calculating hash values of the parameters by using an SHA-1 algorithm, and connecting the two to obtain the ID of the task B;
s62, if the dependent abstract value of the calculation module is not calculated in the operation, using a Tarjan Hash algorithm to obtain the dependent abstract value of the calculation task; if the calculation module calculates the dependency summary value in the current operation, the step S63 is reached directly;
s63, searching a cache file corresponding to the calculation task in the state cache module by using the unique ID, if the cache file does not exist, creating the cache file and locking, marking that the task B needs to be executed, and going to the step S64; if the cache file exists, opening and locking the cache file, reading the dependent abstract value from the cache file, and comparing the dependent abstract value in the cache file with the dependent abstract value in the current operation; if the dependent abstract values are different, the marking task B needs to be executed, and the step S64 is reached; if the dependent abstract values are the same, marking the task B without executing, and reaching the step S65;
s64, calling a run function of the task B, writing the dependent abstract value of the current operation into a cache file, and writing the return value of the run function into the cache file of the task B;
and S65, unlocking the cache file and returning the return value.
7. The computer task scheduling method facing machine learning modeling, as claimed in claim 6, wherein the calculation process depending on the digest value includes the following steps:
step a, establishing two stack data structures, namely S _ search and S _ member;
step b, respectively pressing the computing module of the task B into an S _ search and an S _ member;
step c, inspecting a module m on the top of the S _ search stack; if the module m is not accessed, the step d is reached; if the module m is accessed, the step e is reached;
step d, the marking module m is accessed; marking the earliest accessed module low (m) which can be reached by the module m, wherein the low (m) is the module m per se; traversing all the modules m 'which are not accessed and depended by the module m, pushing m' into S _ search and S _ number, and returning to the step c;
e, popping up a module m from the S _ search, and establishing a string h _ deps for the module m; calculating the hash value of the program code file of the module m by using an SHA-1 algorithm, and adding the hash value to h _ deps of the module m; traversing the data file f of the module m, calculating the hash value of the data f by using an SHA-1 algorithm, and adding the hash value to h _ deps of the module m; traversing all modules m ' of the module m, introducing the S _ search, and if the module m ' generates a hash value h _ final, adding the h _ final of the module m ' to h _ deps of m; if the module m ' does not generate the hash value, updating the module low (m) which can be reached by the module m and is accessed earliest according to the module low (m ') which can be reached by the module m ' and accessed earliest; if low (m) is not m itself, returning to step c; if low (m) is m itself, go to step f;
step f, establishing a sequence L, traversing all modules m 'starting from m in the S _ member, adding h _ deps of m' to L, sequencing h _ deps strings in L, calculating a hash value h of all strings in L by using an SHA-1 algorithm, traversing all modules m 'starting from module m in the S _ member, making h _ final of m' be h, and ejecting modules from the S _ member until ejecting modules m;
and step g, if the S _ search is not empty, returning to the step c, otherwise, ending the algorithm, recording h _ final values of all modules as dependent abstract values of the modules, and returning to h _ final values of calculation modules of the task B.
8. A computer task scheduler for machine-learning-oriented modeling, comprising:
each computing task is programmed in one computing module, a run function in the computing module is taken as an inlet, and the run function is called to execute the computing task;
the state caching module consists of a plurality of caching files, and each computing task corresponds to one caching file;
and the task module comprises a require function which is used for scheduling the calculation tasks of the calculation module.
9. An electronic device comprising a processor, a storage medium, and a computer program, the computer program being stored in the storage medium, wherein the computer program, when executed by the processor, implements the computer task scheduling method for machine learning modeling according to any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for computer task scheduling for machine-learning-modeling according to any one of claims 1 to 7.
CN202010242943.0A 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling Active CN111506402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242943.0A CN111506402B (en) 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242943.0A CN111506402B (en) 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling

Publications (2)

Publication Number Publication Date
CN111506402A true CN111506402A (en) 2020-08-07
CN111506402B CN111506402B (en) 2023-06-27

Family

ID=71867249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242943.0A Active CN111506402B (en) 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling

Country Status (1)

Country Link
CN (1) CN111506402B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094496A (en) * 2002-08-30 2004-03-25 Ntt Comware Corp Sequence diagram preparing device, and method therefor, sequence diagram preparing program, and recording medium therefor
CN102906696A (en) * 2010-03-26 2013-01-30 维尔图尔梅特里克斯公司 Fine grain performance resource management of computer systems
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof
CN106327056A (en) * 2016-08-08 2017-01-11 成都四威高科技产业园有限公司 Real time clock-based AGV task management system and method
CN107885587A (en) * 2017-11-17 2018-04-06 清华大学 A kind of executive plan generation method of big data analysis process
WO2018113642A1 (en) * 2016-12-20 2018-06-28 西安电子科技大学 Control flow hiding method and system oriented to remote computing
CN109814878A (en) * 2018-12-20 2019-05-28 中国电子科技集团公司第十五研究所 Cross-platform cross commercialization is from the complicated huge information system mixed deployment system of primary climate
US10409560B1 (en) * 2015-11-18 2019-09-10 Amazon Technologies, Inc. Acceleration techniques for graph analysis programs
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method
CN110895484A (en) * 2018-09-12 2020-03-20 北京奇虎科技有限公司 Task scheduling method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094496A (en) * 2002-08-30 2004-03-25 Ntt Comware Corp Sequence diagram preparing device, and method therefor, sequence diagram preparing program, and recording medium therefor
CN102906696A (en) * 2010-03-26 2013-01-30 维尔图尔梅特里克斯公司 Fine grain performance resource management of computer systems
US10409560B1 (en) * 2015-11-18 2019-09-10 Amazon Technologies, Inc. Acceleration techniques for graph analysis programs
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof
CN106327056A (en) * 2016-08-08 2017-01-11 成都四威高科技产业园有限公司 Real time clock-based AGV task management system and method
WO2018113642A1 (en) * 2016-12-20 2018-06-28 西安电子科技大学 Control flow hiding method and system oriented to remote computing
CN107885587A (en) * 2017-11-17 2018-04-06 清华大学 A kind of executive plan generation method of big data analysis process
CN110895484A (en) * 2018-09-12 2020-03-20 北京奇虎科技有限公司 Task scheduling method and device
CN109814878A (en) * 2018-12-20 2019-05-28 中国电子科技集团公司第十五研究所 Cross-platform cross commercialization is from the complicated huge information system mixed deployment system of primary climate
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李新磊;: "基于依赖型任务和Sarsa(λ)算法的云计算任务调度" *
陈廷伟;张斌;郝宪文;: "基于任务-资源分配图优化选取的网格依赖任务调度" *

Also Published As

Publication number Publication date
CN111506402B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
US8171001B2 (en) Using a data mining algorithm to generate rules used to validate a selected region of a predicted column
US8166000B2 (en) Using a data mining algorithm to generate format rules used to validate data sets
JP3209163B2 (en) Classifier
US7340475B2 (en) Evaluating dynamic expressions in a modeling application
US20060010055A1 (en) Business evaluation supporting method
CN109299177A (en) Data pick-up method, apparatus, storage medium and electronic equipment
CN114638234B (en) Big data mining method and system applied to online business handling
US20220300280A1 (en) Predictive build quality assessment
EP3948501A1 (en) Hierarchical machine learning architecture including master engine supported by distributed light-weight real-time edge engines
CN115358397A (en) Parallel graph rule mining method and device based on data sampling
KR102582779B1 (en) Knowledge completion method and apparatus through neuro symbolic-based relation embeding
US6889219B2 (en) Method of tuning a decision network and a decision tree model
US20030126138A1 (en) Computer-implemented column mapping system and method
CN113705207A (en) Grammar error recognition method and device
CN111506402A (en) Computer task scheduling method, device, equipment and medium for machine learning modeling
CN116467219A (en) Test processing method and device
CN115309995A (en) Scientific and technological resource pushing method and device based on demand text
CN115345600A (en) RPA flow generation method and device
CN114791865A (en) Method, system and medium for detecting self-consistency of configuration items based on relational graph
US7305373B1 (en) Incremental reduced error pruning
US7523031B1 (en) Information processing apparatus and method capable of processing plurality type of input information
JP2000029899A (en) Matching method for building and map, and recording medium
CN117194275B (en) Automatic software automatic test plan generation method and system based on intelligent algorithm
JP2019159988A (en) Neural network device and program
US20220207416A1 (en) System and method of providing correction assistance on machine learning workflow predictions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant