CN111506402B - Computer task scheduling method, device, equipment and medium for machine learning modeling - Google Patents

Computer task scheduling method, device, equipment and medium for machine learning modeling Download PDF

Info

Publication number
CN111506402B
CN111506402B CN202010242943.0A CN202010242943A CN111506402B CN 111506402 B CN111506402 B CN 111506402B CN 202010242943 A CN202010242943 A CN 202010242943A CN 111506402 B CN111506402 B CN 111506402B
Authority
CN
China
Prior art keywords
task
module
calculation
value
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010242943.0A
Other languages
Chinese (zh)
Other versions
CN111506402A (en
Inventor
胡宸章
朱明杰
魏岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Creditx Information Technology Co ltd
Original Assignee
Shanghai Creditx Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Creditx Information Technology Co ltd filed Critical Shanghai Creditx Information Technology Co ltd
Priority to CN202010242943.0A priority Critical patent/CN111506402B/en
Publication of CN111506402A publication Critical patent/CN111506402A/en
Application granted granted Critical
Publication of CN111506402B publication Critical patent/CN111506402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a machine learning modeling-oriented computer task scheduling method, a device, equipment and a medium, wherein the method comprises the following steps: s1, a task module is established, wherein the task module comprises a requiring function; s2, defining a calculation task to be executed as a task I, defining a calculation task on which the task I depends as a task II, and executing the calculation task by calling a run function in the calculation module; s3, introducing a task A into a task module; s4, introducing a task A into a calculation module where a task B is located; s5, the task A calls a requiring function in the task module, and a calculation module where the task B is located is used as a parameter to be transmitted in; s6, the requiring function invokes the calculation result of the task B and returns the calculation result to the task A. The invention automatically acquires the dependency relationship of the calculation tasks from the program codes, records and tracks the program codes, the data files and the change of the dependency relationship, determines whether each calculation task needs to be executed, and overcomes the defects and the shortcomings of the existing calculation task scheduling mode.

Description

Computer task scheduling method, device, equipment and medium for machine learning modeling
Technical Field
The invention belongs to the technical field of computer data processing, and particularly relates to a computer task scheduling method, device, equipment and medium for machine learning modeling.
Background
Machine learning models class projects typically involve multiple computing tasks. In each computing task, the computer needs to execute a program corresponding to the computing task. The computing tasks have a dependency relationship, and after the execution of the dependent computing tasks is completed, the computing tasks relying on the dependent computing tasks can be executed. (e.g., computing task C depends on computing task A, B. C must be executed after A, B execution is completed) the program corresponding to each computing task, and the dependency relationship between computing tasks, can be changed during the course of project execution. The changed computing task needs to be re-executed. If a dependent computing task is re-executed, the computing task that depends on it should also be re-executed.
The existing scheduling mode is mainly manual scheduling, namely, a computer is operated by a person to execute a corresponding program according to a calculation task required to be executed. When the related calculation tasks are more and the dependency relationship is more complex, the phenomena of repeated execution and missing execution are easy to occur.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a computer task scheduling method oriented to machine learning modeling, which adopts a computer program to automatically manage the process of executing the computing task and can overcome the defects that the manual scheduling in the prior art is difficult to cope with more task numbers and complex dependency relations. The invention also provides a computer task scheduling device, equipment and medium for machine learning modeling.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect of the present invention, a method for scheduling a computer task for machine learning modeling is provided, including the steps of:
s1, a task module is established, wherein the task module comprises a requiring function, and the requiring function is used for calculating the dispatching of tasks;
s2, defining a calculation task to be executed as a task I, defining a calculation task on which the task I depends as a task II, writing each calculation task into a calculation module, and executing the calculation task by calling a run function in the calculation module;
s3, introducing a task A into a task module;
s4, introducing a task A into a calculation module where a task B is located;
s5, the task A calls a requiring function in the task module, and a calculation module where the task B is located is used as a parameter to be transmitted in;
s6, the requiring function invokes the calculation result of the task B and returns the calculation result to the task A.
The run function of each calculation task has optional parameters, different parameters are called for the same run function, and then the run function is regarded as different calculation tasks, and the parameters and the return value of the run function are serializable.
Each computing task corresponds to a unique ID and a unique cache file, and the cache file is stored in the cache module.
Each calculation task obtains the unique ID of the calculation task by obtaining the path of the run function, serializing all the optional parameters, calculating the hash value of the parameters by using the SHA-1 algorithm, and connecting the path with the hash value of the parameters.
In step S6, the request function obtains a return value of the run function in the calculation module of the task b.
In step S6, the judging process of the request function includes the following steps:
s61, acquiring paths of run functions in a calculation module of the task B, serializing all optional parameters, calculating hash values of the parameters by using an SHA-1 algorithm, and connecting the two to obtain an ID of the task B;
s62, if the calculation module does not calculate the dependent abstract value in the current operation, a Tarjan Hash algorithm is used to obtain the dependent abstract value of the calculation task; if the calculation module calculates the dependency abstract value in the current operation, directly reaching the step S63;
s63, searching a cache file corresponding to the calculation task in the state cache module by using the unique ID, if the cache file does not exist, creating the cache file and locking the cache file, marking that the task B needs to be executed, and reaching the step S64; if the cache file exists, opening the cache file and locking the cache file, reading a dependent abstract value from the cache file, and comparing the dependent abstract value in the cache file with the dependent abstract value in the current operation; if the dependency abstract values are different, marking task B to be executed, and reaching step S64; if the dependency abstract values are the same, marking task B does not need to be executed, and the step S65 is reached;
s64, calling a run function of the task B, writing the dependency abstract value of the current operation into a cache file, and writing the return value of the run function into the cache file of the task B;
s65, unlocking the cache file and returning the return value.
Wherein, the calculation process depending on the digest value includes the following steps:
step a, two stack data structures are established, namely S_search and S_module;
step b, pressing the calculation module of the task B into an S_search and an S_module respectively;
c, checking a module m at the top of the S_search stack; if the module m is not accessed, the step d is reached; if the module m is accessed, the step e is reached;
step d, the marking module m is accessed; marking the earliest accessed module low (m) reachable by the module m, wherein low (m) is m self; traversing all the unviewed modules m 'relied by the module m, pressing m' into S_search and S_module, and returning to the step c;
step e, popping up a module m from the S_search, and establishing a string h_deps for the module m; calculating a hash value of a program code file of the module m by using an SHA-1 algorithm, and adding the hash value to h_deps of the module m; traversing the data file f of the module m, calculating a hash value of the data f by using an SHA-1 algorithm, and adding the hash value to h_deps of the module m; traversing the module m to introduce all the modules m 'of the S_search, and if the module m' generates a hash value h_final, adding the h_final of the module m 'to the h_deps of the module m'; if the module m ' does not generate the hash value, updating the earliest accessed module low (m) reachable by the module m according to the earliest accessed module low (m ') reachable by the module m '; if low (m) is not m itself, returning to step c; if low (m) is m itself, step f is reached;
f, establishing a sequence L, traversing all modules m 'from m in S_membrane, adding h_deps of m' to L, sequencing h_deps strings in L, calculating hash values h of all strings in L by using SHA-1 algorithm, traversing all modules m 'from m in S_membrane, enabling h_final of m' to be h, and popping up the modules from S_membrane until popping up the modules m;
and g, if the S_search is not null, returning to the step c, otherwise, ending the algorithm, wherein the h_final value of each module is the dependency digest value of the module, recording the values, and returning to the h_final value of the calculation module of the task B.
In another aspect, the present invention provides a machine learning modeling oriented computer task scheduling device, including:
the system comprises a plurality of calculation modules, a plurality of control modules and a plurality of control modules, wherein each calculation task is written in one calculation module, a run function in the calculation module is taken as an inlet, and the run function is called to execute the calculation task;
the state caching module consists of a plurality of caching files, and each computing task corresponds to one caching file;
and the task module comprises a requiring function used for scheduling the calculation tasks of the calculation module.
In another aspect, the present invention provides an electronic device comprising a processor, a storage medium, and a computer program stored in the storage medium, which when executed by the processor, implements the machine learning modeling oriented computer task scheduling method of any one of claims 1 to 7.
In another aspect, the present invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the machine learning modeling oriented computer task scheduling method of any one of claims 1 to 7.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a programming interface which is easy to understand and use for the calculation tasks, automatically acquires the dependency relationship of the calculation tasks from the program codes, records and tracks the program codes, the data files and the change of the dependency relationship, determines whether each calculation task needs to be executed according to the dependency relationship, obtains an execution plan, executes each calculation task according to the execution plan, and overcomes the defects of the existing calculation task scheduling mode. Compared with the prior art that the calculation task scheduling method focuses on scheduling priority and other problems, the method focuses on task scheduling related to various changing situations.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a process of acquiring a task b return value in the case where the task a executes or re-executes the task b in the present invention.
Fig. 2 is a process of acquiring a task b return value in the case that the task b is not re-executed in the present invention.
FIG. 3 is a flowchart of a method for calculating a module A according to the present invention.
Fig. 4 is a flow chart of task b execution mode in the present invention.
FIG. 5 is a flow chart of the request function of the present invention.
Fig. 6 is a flowchart of the Tarjan Hash algorithm in the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
The embodiment provides a computer task scheduling method oriented to machine learning modeling, so as to overcome the defect that manual scheduling is difficult to cope with more calculation task numbers and complex dependency relationships. The implementation is implemented in the Python programming language, and the same method can also be used in some other programming languages.
The computation required by a machine learning modeling class project is generally split into a plurality of computation tasks, the computation tasks have a dependency relationship, and after the execution of the dependent computation tasks is completed, the computation tasks which depend on the dependent computation tasks can be executed. (e.g., computing task C depends on computing task A, B. C must be executed after A, B execution is completed) the program corresponding to each computing task, and the dependency relationship between computing tasks, can be changed during the course of project execution. The changed computing task needs to be re-executed. If a dependent computing task is re-executed, the computing task that depends on it should also be re-executed.
The embodiment comprises the following steps:
s1, a task module is established, wherein the task module comprises a requiring function, and the requiring function is used for calculating task scheduling. A state buffer module is established, wherein the state buffer module comprises a plurality of buffer files, and the buffer files are stored in a file system. Each calculation task corresponds to a cache file, each calculation task sequences all optional parameters by acquiring a path of a run function, calculates hash values of the parameters by using an SHA-1 algorithm, connects the path with the hash values of the parameters to obtain a unique ID of the calculation task, and can find a unique corresponding state cache file through the unique ID.
S2, defining a calculation task to be executed as a task I, defining a calculation task on which the task I depends as a task II, writing each calculation task into a calculation module, taking a run function in the module as an inlet, and calling the run function to execute the calculation task. run functions have optional parameters, and if different parameters are called for the same run function, the run functions are regarded as different calculation tasks. The result of the calculation task is returned by the return value (return value) of the run function. The parameters and return values of the run function should be serializable, a value being serializable, meaning that the value can be converted into a byte string consisting of several bytes, one-to-one with the original value, and the converted byte string can be converted back to the original value.
S3, introducing the task A into a task module.
S4, introducing the task A into a calculation module where the task B is located.
S5, the task A calls a requiring function in the task module, and the computing module where the task B is located is used as a parameter to be transmitted in.
S6, the request function invokes the task B, and the request judges whether the task B needs to be executed or re-executed, obtains a return value of the task B run function and returns the return value to the task A.
If the task B needs to be executed:
the request function calls the run function of the task B, writes the return value into the state cache file, calls the return value of the task B from the state cache, and returns the return value to the task A.
Optionally, the request function accepts several serializable parameters from task A and passes the parameters to the run function of task B. If different parameters are transmitted during multiple calls, the request function will respectively determine whether the task B needs to be executed, and write the return values into different state cache files. The computing task may declare the data file it depends on, and if the contents of the data file change, the computing task needs to be re-executed. If a computing task meets any one of the following conditions, the computing task is executed:
1. the calculation task is not executed, namely a run function return value of the calculation task is not found in the cache;
2. the program code file corresponding to the calculation task is changed, namely, the program code file is different from the program code file in the last execution;
3. the data file corresponding to the calculation task is changed, namely, the data file is different from the data file in the last execution;
4. the dependency of the computing task is changed (if this bar is true, the second bar is necessarily true, so the bar does not need to be considered alone);
5. recursively, the computing task on which the computing task depends satisfies any one of the above conditions.
In this embodiment, the user invokes the requiring function to transfer the module where the computing task (usually the computing task for generating the final result or the computing task recently developed and completed) that needs to obtain the result is located, and the requiring function executes the computing task as required and returns the final return value.
The parameters of the request function are the run function module of the computing task to be performed and a number of optional parameters for the incoming run function and distinguishing the computing task. The method can execute the calculation tasks as required, put the return value into the state cache and return the return value to be finally acquired.
The judging process of the request function comprises the following steps:
s61, acquiring paths of run functions in a calculation module of the task B, serializing all optional parameters, calculating hash values of the parameters by using an SHA-1 algorithm, and connecting the two to obtain an ID of the task B;
s62, if the calculation module does not calculate the dependent abstract value in the current operation, a Tarjan Hash algorithm is used to obtain the dependent abstract value of the calculation task; if the calculation module calculates the dependency abstract value in the current operation, directly reaching the step S63;
s63, searching a cache file corresponding to the calculation task in the state cache module by using the unique ID, if the cache file does not exist, creating the cache file and locking the cache file, marking that the task B needs to be executed, and reaching the step S64; if the cache file exists, opening the cache file and locking the cache file, reading a dependent abstract value from the cache file, and comparing the dependent abstract value in the cache file with the dependent abstract value in the current operation; if the dependency abstract values are different, marking task B to be executed, and reaching step S64; if the dependency abstract values are the same, marking task B does not need to be executed, and the step S65 is reached;
s64, calling a run function of the task B, writing the dependency abstract value of the current operation into a cache file, and writing the return value of the run function into the cache file of the task B;
s65, unlocking the cache file and returning the return value.
The present embodiment uses an algorithm to generate a hash value of all program code, data files on which the computing task depends as the dependency digest value. The dependency digest value is used to determine whether various dependencies of the computing task have changed. That is, if the dependency digest value changes, the computing task needs to be re-executed.
Based on the constraints of the programming interface, if one computing task (task A) depends on another computing task ("task B"), task A must be introduced into the module in which task B resides.
The change associated with task b will change the dependency digest value of task b. Through the dependency relationship among the modules, the change is transferred to the task A, and the dependency abstract value of the task A is changed.
The Tarjan Hash algorithm is derived based on the Tarjan SCC algorithm (literature: tarjan, RE, depth-first search and linear graph algorithms, SIAM Journal on Computing,1972,1 (2): 146-160).
The calculation process of the dependent digest value comprises the following steps:
step a, two stack data structures are established, namely S_search and S_module;
step b, pressing the calculation module of the task B into an S_search and an S_module respectively;
c, checking a module m at the top of the S_search stack; if the module m is not accessed, the step d is reached; if the module m is accessed, the step e is reached;
step d, the marking module m is accessed; marking the earliest accessed module low (m) reachable by the module m, wherein low (m) is m self; traversing all the unviewed modules m 'relied by the module m, pressing m' into S_search and S_module, and returning to the step c;
step e, popping up a module m from the S_search, and establishing a string h_deps for the module m; calculating a hash value of a program code file of the module m by using an SHA-1 algorithm, and adding the hash value to h_deps of the module m; traversing the data file f of the module m, calculating a hash value of the data f by using an SHA-1 algorithm, and adding the hash value to h_deps of the module m; traversing the module m to introduce all the modules m 'of the S_search, and if the module m' generates a hash value h_final, adding the h_final of the module m 'to the h_deps of the module m'; if the module m ' does not generate the hash value, updating the earliest accessed module low (m) reachable by the module m according to the earliest accessed module low (m ') reachable by the module m '; if low (m) is not m itself, returning to step c; if low (m) is m itself, step f is reached;
f, establishing a sequence L, traversing all modules m 'from m in S_membrane, adding h_deps of m' to L, sequencing h_deps strings in L, calculating hash values h of all strings in L by using SHA-1 algorithm, traversing all modules m 'from m in S_membrane, enabling h_final of m' to be h, and popping up the modules from S_membrane until popping up the modules m;
and g, if the S_search is not null, returning to the step c, otherwise, ending the algorithm, wherein the h_final value of each module is the dependency digest value of the module, recording the values, and returning to the h_final value of the calculation module of the task B.
The following further explains the Tarjan Hash algorithm with an example, and aims to calculate the dependency digest value of the task B, and the following steps are adopted according to the Tarjan Hash algorithm:
step a, two stack data structures S_search and S_module are established;
step b, pressing a calculation module of the task B (marked as a module B) into the S_search and the S_module;
c, a trestle top module in the S_search is a module B, and the module B is not accessed, and the step d is reached;
step d, marking the module B as being accessed, low (B) as the module B, and assuming that the module on which the module B depends is the module C and the module C is not accessed, pressing the module C into the S_search and the S_module, wherein the module C is positioned at the stack top; returning to step c;
c, a trestle top module in the S_search is a module C, the module C is not accessed, and the step d is reached;
step d, marking that the module C is accessed, low (C) is the module C, assuming that the module on which the module C depends is a module D, and the module D is not accessed, and positioning the module Ding Yaru S_search and S_module on the stack top; returning to step c;
c, a stack top module in the S_search is a module block, the module block is not accessed, and the step d is reached;
step d, marking the module block to be accessed, wherein low (block) is the module block, the module block is supposed to depend on the module block to be the module block B, the module block B is accessed and positioned in the stack, and at the moment, a new module is not introduced to the S_search and the S_module, and the stack top module is the module block; returning to step c;
c, a stack top module in the S_search is a module block, and the module block is accessed to reach the step e;
step e, ejecting the module block from the S_search, establishing a string h_deps for the module block, wherein the module on which the module block depends is a module B, the module B does not generate a dependent abstract value, the low (B) is the module B, the low (block) is not the module block itself, and returning to the step c;
c, a trestle top module in the S_search is a module C, and the module C is accessed to reach the step e;
step e, ejecting the module C from the S_search, establishing a string of h_deps for the module C, wherein the module on which the module C depends is a module D, the module D does not generate a dependent abstract value, the low (D) is a module B, the low (C) is not the module C, and the step c is returned to;
c, a trestle top module in the S_search is a module B, and the module B is accessed to reach the step e;
step e, ejecting the module B from the S_search, establishing a string h_deps for the module B, wherein the module on which the module B depends is the module C, the low (C) is the module B, the low (B) is the module B, and the step f is performed;
step f, traversing all modules from the module B in the S_membrane, namely all modules from the S_membrane stack top to the module B, wherein the S_membrane comprises a module D, a module C and a module B in sequence from the stack top, sequencing h_deps strings of the module D, the module C and the module B, calculating hash values of all strings, and sequentially popping up the module D, the module C and the module B from the S_membrane stack top, wherein the dependent abstract values of the module D, the module C and the module B are all the hash values obtained by calculation.
And g, S_search is empty, and ending the algorithm.
The embodiment also provides a computer task scheduling device facing machine learning modeling, which comprises:
the system comprises a plurality of calculation modules, a plurality of control modules and a plurality of control modules, wherein each calculation task is written in one calculation module, a run function in the calculation module is taken as an inlet, and the run function is called to execute the calculation task;
the state caching module consists of a plurality of caching files, and each computing task corresponds to one caching file;
and the task module comprises a requiring function used for scheduling the calculation tasks of the calculation module.
The embodiment also provides an electronic device, which includes a processor, a storage medium, and a computer program, where the computer program is stored in the storage medium, and when executed by the processor, implements the computer task scheduling method facing machine learning modeling of the first aspect.
In another aspect, the present invention provides a computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements a computer task scheduling method for machine learning oriented modeling according to the first aspect described above.
While the foregoing embodiments have been described in detail and with reference to the present invention, it will be apparent to one skilled in the art that modifications and improvements can be made based on the disclosure without departing from the spirit and scope of the invention.

Claims (7)

1. The computer task scheduling method for machine learning modeling is characterized by comprising the following steps of:
s1, a task module is established, wherein the task module comprises a requiring function, and the requiring function is used for calculating the dispatching of tasks;
s2, defining a calculation task to be executed as a task I, defining a calculation task on which the task I depends as a task II, writing each calculation task into a calculation module, and executing the calculation task by calling a run function in the calculation module;
s3, introducing a task A into a task module;
s4, introducing a task A into a calculation module where a task B is located;
s5, the task A calls a requiring function in the task module, and a calculation module where the task B is located is used as a parameter to be transmitted in;
s6, the requiring function invokes the calculation result of the task B and returns the calculation result to the task A;
in step S6, the return value of the run function in the calculation module of the task B is obtained by the requiring function;
the judging process of the requiring function comprises the following steps:
s61, acquiring paths of run functions in a calculation module of the task B, serializing all optional parameters, calculating hash values of the parameters by using an SHA-1 algorithm, and connecting the two to obtain an ID of the task B;
s62, if the calculation module does not calculate the dependent abstract value in the current operation, a Tarjan Hash algorithm is used to obtain the dependent abstract value of the calculation task; if the calculation module calculates the dependency abstract value in the current operation, directly reaching the step S63;
s63, searching a cache file corresponding to the calculation task in the state cache module by using the unique ID, if the cache file does not exist, creating the cache file and locking the cache file, marking that the task B needs to be executed, and reaching the step S64; if the cache file exists, opening the cache file and locking the cache file, reading a dependent abstract value from the cache file, and comparing the dependent abstract value in the cache file with the dependent abstract value in the current operation; if the dependency abstract values are different, marking task B to be executed, and reaching step S64; if the dependency abstract values are the same, marking task B does not need to be executed, and the step S65 is reached;
s64, calling a run function of the task B, writing the dependency abstract value of the current operation into a cache file, and writing the return value of the run function into the cache file of the task B;
s65, unlocking the cache file and returning a return value;
wherein, the calculation process depending on the digest value includes the following steps:
step a, two stack data structures are established, namely S_search and S_module;
step b, pressing the calculation module of the task B into an S_search and an S_module respectively;
c, checking a module m at the top of the S_search stack; if the module m is not accessed, the step d is reached; if the module m is accessed, the step e is reached;
step d, the marking module m is accessed; marking the earliest accessed module low (m) reachable by the module m, wherein low (m) is m self; traversing all the unviewed modules m 'relied by the module m, pressing m' into S_search and S_module, and returning to the step c;
step e, popping up a module m from the S_search, and establishing a string h_deps for the module m; calculating a hash value of a program code file of the module m by using an SHA-1 algorithm, and adding the hash value to h_deps of the module m; traversing the data file f of the module m, calculating a hash value of the data f by using an SHA-1 algorithm, and adding the hash value to h_deps of the module m; traversing the module m to introduce all the modules m 'of the S_search, and if the module m' generates a hash value h_final, adding the h_final of the module m 'to the h_deps of the module m'; if the module m ' does not generate the hash value, updating the earliest accessed module low (m) reachable by the module m according to the earliest accessed module low (m ') reachable by the module m '; if low (m) is not m itself, returning to step c; if low (m) is m itself, step f is reached;
f, establishing a sequence L, traversing all modules m 'from m in S_membrane, adding h_deps of m' to L, sequencing h_deps strings in L, calculating hash values h of all strings in L by using SHA-1 algorithm, traversing all modules m 'from m in S_membrane, enabling h_final of m' to be h, and popping up the modules from S_membrane until popping up the modules m;
and g, if the S_search is not null, returning to the step c, otherwise, ending the algorithm, wherein the h_final value of each module is the dependency digest value of the module, recording the values, and returning to the h_final value of the calculation module of the task B.
2. The machine learning modeling oriented computer task scheduling method of claim 1 wherein run functions of each computing task have selectable parameters, different parameters are called for the same run function, and are considered to be different computing tasks, and parameters and return values of run functions should be serializable.
3. The machine learning modeling oriented computer task scheduling method of claim 1, wherein each computing task corresponds to a unique ID and a unique cache file, and the cache files are stored in the cache module.
4. A machine learning modeling oriented computer task scheduling method according to claim 3, wherein each computing task obtains the unique ID of the computing task by obtaining the path of the run function itself, serializing all the optional parameters and calculating the hash value of the parameters using SHA-1 algorithm, and connecting the path with the hash value of the parameters.
5. A machine learning modeling oriented computer task scheduling device applied to the machine learning modeling oriented computer task scheduling method of any one of claims 1 to 4, comprising:
the system comprises a plurality of calculation modules, a plurality of control modules and a plurality of control modules, wherein each calculation task is written in one calculation module, a run function in the calculation module is taken as an inlet, and the run function is called to execute the calculation task;
the state caching module consists of a plurality of caching files, and each computing task corresponds to one caching file;
and the task module comprises a requiring function used for scheduling the calculation tasks of the calculation module.
6. An electronic device comprising a processor, a storage medium, and a computer program stored in the storage medium, wherein the computer program when executed by the processor implements the machine learning modeling oriented computer task scheduling method of any one of claims 1 to 4.
7. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the machine learning modeling oriented computer task scheduling method of any one of claims 1 to 4.
CN202010242943.0A 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling Active CN111506402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242943.0A CN111506402B (en) 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242943.0A CN111506402B (en) 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling

Publications (2)

Publication Number Publication Date
CN111506402A CN111506402A (en) 2020-08-07
CN111506402B true CN111506402B (en) 2023-06-27

Family

ID=71867249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242943.0A Active CN111506402B (en) 2020-03-31 2020-03-31 Computer task scheduling method, device, equipment and medium for machine learning modeling

Country Status (1)

Country Link
CN (1) CN111506402B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357768A (en) * 2022-01-04 2022-04-15 华东师范大学 Shape generating method and system for shaft-based intelligent communication intelligent system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094496A (en) * 2002-08-30 2004-03-25 Ntt Comware Corp Sequence diagram preparing device, and method therefor, sequence diagram preparing program, and recording medium therefor
CN102906696A (en) * 2010-03-26 2013-01-30 维尔图尔梅特里克斯公司 Fine grain performance resource management of computer systems
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof
CN106327056A (en) * 2016-08-08 2017-01-11 成都四威高科技产业园有限公司 Real time clock-based AGV task management system and method
CN107885587A (en) * 2017-11-17 2018-04-06 清华大学 A kind of executive plan generation method of big data analysis process
WO2018113642A1 (en) * 2016-12-20 2018-06-28 西安电子科技大学 Control flow hiding method and system oriented to remote computing
CN109814878A (en) * 2018-12-20 2019-05-28 中国电子科技集团公司第十五研究所 Cross-platform cross commercialization is from the complicated huge information system mixed deployment system of primary climate
US10409560B1 (en) * 2015-11-18 2019-09-10 Amazon Technologies, Inc. Acceleration techniques for graph analysis programs
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method
CN110895484A (en) * 2018-09-12 2020-03-20 北京奇虎科技有限公司 Task scheduling method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094496A (en) * 2002-08-30 2004-03-25 Ntt Comware Corp Sequence diagram preparing device, and method therefor, sequence diagram preparing program, and recording medium therefor
CN102906696A (en) * 2010-03-26 2013-01-30 维尔图尔梅特里克斯公司 Fine grain performance resource management of computer systems
US10409560B1 (en) * 2015-11-18 2019-09-10 Amazon Technologies, Inc. Acceleration techniques for graph analysis programs
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof
CN106327056A (en) * 2016-08-08 2017-01-11 成都四威高科技产业园有限公司 Real time clock-based AGV task management system and method
WO2018113642A1 (en) * 2016-12-20 2018-06-28 西安电子科技大学 Control flow hiding method and system oriented to remote computing
CN107885587A (en) * 2017-11-17 2018-04-06 清华大学 A kind of executive plan generation method of big data analysis process
CN110895484A (en) * 2018-09-12 2020-03-20 北京奇虎科技有限公司 Task scheduling method and device
CN109814878A (en) * 2018-12-20 2019-05-28 中国电子科技集团公司第十五研究所 Cross-platform cross commercialization is from the complicated huge information system mixed deployment system of primary climate
CN110245003A (en) * 2019-06-06 2019-09-17 中信银行股份有限公司 A kind of machine learning uniprocessor algorithm arranging system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李新磊 ; .基于依赖型任务和Sarsa(λ)算法的云计算任务调度.计算机测量与控制.2015,(08),全文. *
陈廷伟 ; 张斌 ; 郝宪文 ; .基于任务-资源分配图优化选取的网格依赖任务调度.计算机研究与发展.2007,(10),全文. *

Also Published As

Publication number Publication date
CN111506402A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
US4853873A (en) Knowledge information processing system and method thereof
US8171001B2 (en) Using a data mining algorithm to generate rules used to validate a selected region of a predicted column
US7921330B2 (en) Data migration manager
JP2957375B2 (en) Data processing system and method for correcting character recognition errors in digital images of document format
US7617444B2 (en) File formats, methods, and computer program products for representing workbooks
EP1650675A2 (en) External metadata processing
CN110347598B (en) Test script generation method and device, server and storage medium
US9009175B2 (en) System and method for database migration and validation
US5513350A (en) Update constraints in transactions which may abort
CN110780879B (en) Decision execution method, device, equipment and medium based on intelligent compiling technology
CN111930489A (en) Task scheduling method, device, equipment and storage medium
CN110399089B (en) Data storage method, device, equipment and medium
CN111506402B (en) Computer task scheduling method, device, equipment and medium for machine learning modeling
US9582291B2 (en) Selecting a mapping that minimizes conversion costs
JP2019211805A (en) Database migration support system and program
US20040205657A1 (en) Method and system for linking project information
US20030182318A1 (en) Method and apparatus for improving transaction specification by marking application states
CN113836005A (en) Virtual user generation method and device, electronic equipment and storage medium
US8904348B2 (en) Method and system for handling errors during script execution
CN113239064A (en) Database updating method and device, electronic equipment and storage medium
Griffin et al. Integrity maintenance in a telecommunications switch
CN112015560A (en) Device for constructing IT infrastructure
CN112114811A (en) Compiling method, device and equipment
US8615730B2 (en) Modeled types-attributes, aliases and context-awareness
CN117075911B (en) Variable code conversion method from PL language to C language, storage medium and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant