CN112558938A - Machine learning workflow scheduling method and system based on directed acyclic graph - Google Patents

Machine learning workflow scheduling method and system based on directed acyclic graph Download PDF

Info

Publication number
CN112558938A
CN112558938A CN202011490334.3A CN202011490334A CN112558938A CN 112558938 A CN112558938 A CN 112558938A CN 202011490334 A CN202011490334 A CN 202011490334A CN 112558938 A CN112558938 A CN 112558938A
Authority
CN
China
Prior art keywords
component
machine learning
connection
components
directed acyclic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011490334.3A
Other languages
Chinese (zh)
Other versions
CN112558938B (en
Inventor
孙显
于泓峰
付琨
李硕轲
臧倩
祝阳光
闫志远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202011490334.3A priority Critical patent/CN112558938B/en
Publication of CN112558938A publication Critical patent/CN112558938A/en
Application granted granted Critical
Publication of CN112558938B publication Critical patent/CN112558938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/24Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The utility model provides a machine learning workflow scheduling method based on directed acyclic graph, comprising: s1, packaging the components by adopting component packaging templates corresponding to component categories, wherein each component at least corresponds to one machine learning task; s2, acquiring machine learning tasks and task sequences contained in the machine learning workflow, and connecting components corresponding to the machine learning tasks according to the task sequences; s3, generating a directed acyclic workflow diagram according to the connected components, and executing a machine learning workflow according to the directed acyclic workflow diagram. The disclosure also provides a machine learning workflow scheduling system based on the directed acyclic graph.

Description

Machine learning workflow scheduling method and system based on directed acyclic graph
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to a machine learning workflow scheduling method and system based on a directed acyclic graph.
Background
In recent years, Machine Learning (ML) algorithms have been developed rapidly and widely used, and due to the simplicity and strong interpretability of the algorithms, the algorithms have achieved remarkable results on tasks such as target identification and target detection, and are successfully applied to the fields of financial transactions, commodity recommendation, traffic prediction and the like. Most of the existing machine learning algorithms are constructed and tested from a code layer, and generally need procedures such as environment configuration, algorithm flow design, data interface design, program writing, program debugging, service deployment and the like. For the primary developers in the field of machine learning, certain difficulty exists in developing and testing the code layer algorithm. Meanwhile, for a relatively urgent application project, the time cost for developing and testing the machine learning algorithm is too high, and the application and popularization of the machine learning algorithm are not facilitated. Moreover, the existing development mode has low reusability for machine learning algorithms and strong dependency for development test environments, and when the same machine learning model faces different application data or is applied to different tasks, the operation environment needs to be reconfigured, a bottom layer interface needs to be planned, a program structure needs to be adjusted and the like, so that a great amount of time and energy of developers are consumed, and great waste is caused to manpower resources.
Flash is a lightweight Web application framework written in Python language, and is also commonly referred to as a micro framework (micro framework) due to its simplicity and ease of extensibility. Flash is very powerful, and some personal applications can be published to a webpage end based on the framework, so that users can easily access the programs through a browser. The embodiment of the application is based on a flash framework, provides a simpler and more convenient machine learning algorithm development and test mode for developers in different fields and different degrees, simplifies the design process and improves the development efficiency.
Disclosure of Invention
The main purpose of the present disclosure is to provide a method and a system for scheduling machine learning workflow based on directed acyclic graph, an electronic device, and a computer readable storage medium, in which by standardizing and packaging bottom layer codes, algorithm construction and testing are easier for primary developers, time cost for model construction and testing is greatly saved, and rapid popularization and wide-range application of machine learning algorithm are facilitated.
A first aspect of the present disclosure provides a method for scheduling a machine learning workflow based on a directed acyclic graph, including: s1, packaging the components by adopting component packaging templates corresponding to component categories, wherein each component at least corresponds to a machine learning task; s2, acquiring machine learning tasks and task sequences contained in the machine learning workflow, and connecting components corresponding to the machine learning tasks according to the task sequences; s3, generating a directed acyclic workflow diagram according to the connected components, and executing a machine learning workflow according to the directed acyclic workflow diagram.
Optionally, encapsulating the component by using the component encapsulation template corresponding to the component category in S1 includes: and encapsulating the components by adopting a component encapsulation template corresponding to the component category, and exposing a data input interface, a data output interface and a component parameter interface of the components for correspondingly and sequentially connecting each component.
Optionally, the method further comprises: S2I, after the components corresponding to the machine learning task are connected according to the task sequence, checking whether the two components connected in front and back accord with the connection specification or not according to the normative input type, input quantity, output type and output quantity of the components; if yes, performing step S3; otherwise, rejecting the connection component not meeting the connection specification and reporting an error prompt, and repeatedly executing the steps S1-S2I until the component connection meets the connection specification and executing the step S3.
Optionally, S3 includes: s31, numbering the connected components, generating corresponding connection tuples according to the connection relations of the components, and indicating the connection directions of the connection tuples, wherein each connection tuple comprises two components with connection relations; s32, generating a directed acyclic workflow graph based on the connection tuples and executing a machine learning task flow.
Optionally, the number and the connection tuple in S31 are unique, so that the component index is unique.
Optionally, the error reporting manner in S2I is to provide a pop-up box to prompt the connection error information of the component in the machine learning task flow.
A second aspect of the present disclosure provides a directed acyclic graph-based machine learning workflow scheduling system, comprising: the component packaging module is used for packaging the components by the component packaging templates corresponding to the component categories, wherein each component at least corresponds to one machine learning task; the component connection module is used for acquiring the machine learning tasks and the task sequence contained in the machine learning workflow and connecting the components corresponding to the machine learning tasks according to the task sequence; and the directed acyclic workflow graph building module is used for generating a directed acyclic workflow graph according to the connected components and executing the machine learning workflow according to the directed acyclic workflow graph.
Optionally, the system further comprises: and the component connection standard checking module is used for performing connection standard checking on the two components connected in front and back in the connected components output by the component connection module according to the normative input type, the input quantity, the output type and the output quantity of the components.
A third aspect of the present disclosure provides an electronic device, comprising: the present disclosure also provides a computer program stored on a memory and executable on a processor, which when executed by the processor implements the method for scheduling a directed acyclic graph-based machine learning workflow provided by the first aspect of the present disclosure.
A fourth aspect of the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the directed acyclic graph-based machine learning workflow scheduling method provided by the first aspect of the present disclosure.
Compared with the prior art, the method has the following beneficial effects:
1) the construction and the test of a machine learning algorithm are realized without compiling bottom layer codes;
2) constructing a componentized machine learning process, checking connection validity, and giving a prompt;
3) automatically operating the constructed machine learning directed acyclic workflow diagram according to the connection relation and the logic;
4) the difficulty of developing and testing the machine learning algorithm is reduced, and the time cost of algorithm construction and testing is reduced.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
fig. 1 and 2 schematically illustrate a flow diagram of a directed acyclic graph-based machine learning workflow scheduling method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a modular component packaging schematic provided according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram for generating a directed acyclic graph workflow in a directed acyclic graph-based machine learning workflow scheduling method according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a multi-branch directed acyclic workflow graph generated by a directed acyclic graph-based machine learning workflow scheduling method according to an embodiment of the present disclosure;
FIGS. 6 and 7 schematically illustrate block diagrams of a directed acyclic graph-based machine learning workflow scheduling system, according to an embodiment of the present disclosure;
fig. 8 schematically shows a hardware structure diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
Fig. 1 schematically illustrates a flowchart of a directed acyclic graph-based machine learning workflow scheduling method according to an embodiment of the present disclosure.
As shown in fig. 1, the method includes: S1-S3.
And S1, packaging the components by adopting component packaging templates corresponding to the component categories, wherein each component at least corresponds to one machine learning task.
According to the embodiment of the disclosure, the component type and the packaging template of the component are defined according to the component function, so that the subsequent operation of the newly-built component and the effective connection among different components can be conveniently carried out, then the component is packaged according to the defined component modularization packaging specification, and the packaged component is the component with the same level. The components comprise a data component, a data preprocessing component, a machine learning model component, an evaluation algorithm component and the like, and each component at least corresponds to one machine learning task.
Specifically, the component package definition is shown in fig. 3, and the component definition includes: input type, input data, output type, output quantity, component parameter and component function realization definition, and the like. For different types of components, the inputs and outputs are not the same, as: the data set component only has output and no input, and the output data type can be image classification, target detection, named entity identification and the like; and the input type of the feature extraction component is a data set, and the output type of the feature extraction component is extracted features. In addition, different components have different component parameters, so the component parameters are defined and specified, and information such as the name, the parameter default value, the parameter type and the like of each parameter is stored in a dictionary key value pair mode, so that the component parameters can be freely expanded. The specific function realization of different components is different, and the modular packaging is carried out, so that only the data input, the data output and the component parameter setting of the components are exposed, and the output of one component corresponds to the input of the other component connected with the component.
Specifically, the modular packaging of components requires establishing a corresponding class for each component, where all parameters of the component are defined, including: inputting type, input quantity, output type and output quantity, calling specific components in the entry method of component functions to realize specific machine learning tasks, transmitting the obtained component parameters to programs in the component function realization components, and finishing the execution of the component functions.
And S2, acquiring the machine learning tasks and task sequences contained in the machine learning workflow, and connecting the components corresponding to the machine learning tasks according to the task sequences.
According to the embodiment of the disclosure, the components corresponding to the machine learning task are sequentially connected according to the component type sequence of the data set after the specification packaging, the data preprocessing, the feature extraction, the machine learning algorithm (such as classification, detection and the like) and the performance evaluation.
S3, generating a directed acyclic workflow diagram according to the connected components, and executing a machine learning workflow according to the directed acyclic workflow diagram.
According to an embodiment of the present disclosure, as shown in fig. 4, a schematic flowchart for generating a directed acyclic workflow diagram is provided, which includes: s31, numbering the connected components, generating corresponding connection tuples according to the connection relations of the components, and indicating the connection directions of the connection tuples, wherein each connection tuple comprises two components with connection relations; s32, generating a directed acyclic workflow graph based on the connection tuples and executing a machine learning task flow.
Specifically, the connection tuple is a tuple with a connection direction, and two ends of the connection tuple are respectively connected with different types of components, so that the corresponding connection relationship is valid. The serial number and the direction of the connection tuple are unique, so that the component index is unique, and the problems that the connection relation is invalid and the like caused by repeated connection of components of the same type are solved.
As shown in fig. 2, the method further comprises: S2I, after the components corresponding to the machine learning task are connected according to the task sequence, checking whether the two components connected in front and back accord with the connection specification or not according to the normative input type, input quantity, output type and output quantity of the components; if yes, performing step S3; otherwise, rejecting the connection component not meeting the connection specification and reporting an error prompt, and repeatedly executing the steps S1-S2I until the component connection meets the connection specification and executing the step S3.
According to the embodiment of the disclosure, after the components corresponding to the machine learning task in S2 are connected, a connected machine learning task flow is obtained, and the connected machine learning task flow is automatically checked to eliminate possible connection errors, including component type connection errors, logic errors, and the like, which require component connection specification checking. And judging whether the connection between the front component and the rear component meets the requirements or not according to the input type and the output type and the quantity of each component, if the input type or the quantity does not meet the requirements, and if the output type of the front component does not match the input type of the rear component, the connection specification of the components is not met. In addition, a logic error should be judged, such as: the performance evaluation component cannot precede the machine learning algorithm component, cannot just have a data set component without a data processing or machine learning algorithm component, and cannot lack a data set component. And checking the constructed machine learning task flow according to the component connection specification to ensure that no logical errors exist in the subsequent task operation, and giving a corresponding prompt for the found component connection which does not conform to the specification.
Specifically, the error reporting prompting method adopted in the embodiment of the present disclosure is to prompt the connection error information of the component in the machine learning task flow by using a pop-up box.
Fig. 5 schematically illustrates a multi-branch directed acyclic workflow diagram generated by a directed acyclic graph-based machine learning workflow scheduling method according to an embodiment of the present disclosure.
As shown in fig. 5, each single branch in the multi-branch directed acyclic workflow diagram is a machine learning workflow, that is, fig. 5 corresponds to two machine learning workflow diagrams, where each component is connected by a connection tuple 550, and each connection tuple 550 has a connection direction and includes two components having a connection relationship. As shown in fig. 5, the training process of the directed acyclic workflow diagram corresponding to the first branch is as follows: the user first selects a data set (e.g., image data) to be trained, inputs the training data set for the target task into the entry of the data set component 510, the input image data is loaded by the data set component 510, and its output is input to the entry of the first data pre-processing component 521, pre-processing the image data by the first data pre-processing component 521 and outputting data that meets the input specifications of the first machine learning model component 531, inputting the first data pre-processing component 521 output data into the first machine learning model component 531 for a particular task, the first machine learning model component 531 infers an output of the target task, which is an output of classification, segmentation and detection conforming to the input specification of the result evaluation component 540, and the result evaluation component 540 performs machine learning model performance evaluation on the output result of the first machine learning model component 531. Compared with the results of different machine learning model components in the training process, the input of training set data can be carried out through a parallel structure of a multi-branch directed acyclic workflow diagram, namely, the same input data is loaded through the data set component 510 according to the input of the above flow for the same training data set, the input data is output to the inlet of the second data preprocessing component 522, the image data is preprocessed through the second data preprocessing component 522, the data meeting the input specification of the second machine learning model component 532 is output, the output data of the second data preprocessing component 522 is input into the second machine learning model component 532 of a specific task, the second machine learning model component 532 infers to obtain the output of a new target task, the result evaluating component 540 carries out machine learning model performance evaluation on the new target task output by the second machine learning model component 532, meanwhile, evaluation structures of different machine learning model performances can be compared, and then performances of different machine learning models can be integrated to output a performance result of the process. After the training is finished, the user can select different machine learning model components by himself to test the test data set, and similarly, the test data set is loaded and output through the data set component 510, input to the data preprocessing component corresponding to the selected machine learning model component 531 or 532, and output data conforming to the input specification of the selected machine learning model component 531 or 532, the selected machine learning model component 531 or 532 infers to obtain the test task output result conforming to the input specification of the result evaluation component, and the result evaluation component 540 evaluates the test result.
It should be noted that the multi-branch directed acyclic workflow diagram is not limited to the two-branch directed acyclic workflow diagram disclosed in the embodiment of the present disclosure, and according to actual needs, a user may construct a machine learning workflow with more branches according to the method of the embodiment of the present disclosure to train test set data, so as to achieve an actual application purpose.
Fig. 6 schematically illustrates a block diagram of a directed acyclic graph-based machine learning workflow scheduling system, according to an embodiment of the present disclosure.
As shown in fig. 6, the directed acyclic graph-based machine learning workflow scheduling system 600 includes: the system may be used to implement the directed acyclic graph-based machine learning workflow scheduling method described with reference to fig. 1-2, including a component encapsulation module 610, a component connection module 620, and a directed acyclic graph construction module 630.
The component packaging module 610 is used for packaging components by component packaging templates corresponding to component categories, wherein each component at least corresponds to one machine learning task;
the component connection module 620 is configured to obtain a machine learning task and a task sequence included in the machine learning workflow, and connect components corresponding to the machine learning task according to the task sequence.
The directed acyclic workflow graph building module 630 is configured to generate a directed acyclic workflow graph from the connected components and execute a machine learning workflow according to the directed acyclic workflow graph.
As shown in fig. 7, the apparatus further includes: and a component connection specification checking module 620' for checking connection specifications of two components connected in front and back of the connected components output by the component connection module according to the normative input type, input number, output type and output number of the components.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any of the component encapsulation module 610, the component connection module 620, the component connection specification checking module 620' and the directed acyclic workflow diagram construction module 630 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the component packaging module 610, the component connection module 620, the component connection specification checking module 620' and the directed acyclic workflow diagram building module 630 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware and firmware, or by a suitable combination of any several of them. Alternatively, at least one of the component encapsulation module 610, the component connection module 620, the component connection specification checking module 620' and the directed acyclic workflow diagram construction module 630 can be implemented at least in part as a computer program module that, when executed, can perform corresponding functions.
Fig. 8 schematically shows a hardware structure diagram of an electronic device according to an embodiment of the present disclosure.
As shown in fig. 8, the electronic device 800 described in this embodiment includes: memory 810, processor 820 and a computer program stored on memory 810 and executable on the processor, the processor when executing the program implementing the directed acyclic graph-based machine learning workflow scheduling method described in the embodiments illustrated in fig. 1 to 2.
According to an embodiment of the present disclosure, the electronic device further includes: at least one input device 830; at least one output device 840. The memory 810, processor 820 input device 830 and output device 840 are coupled via bus 850.
The input device 830 may be a touch panel, a physical button, a mouse, or the like. The output device 840 may be embodied as a display screen. The Memory 810 may be a high-speed Random Access Memory (RAM) Memory or a non-volatile Memory (non-volatile Memory), such as a disk Memory. Memory 810 is used to store a set of executable program code and processor 820 is coupled to memory 810.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be provided in the terminal in the foregoing embodiments, and the computer-readable storage medium may be the memory in the foregoing embodiment shown in fig. 8. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the directed acyclic graph-based machine learning workflow scheduling method described in the embodiments illustrated in fig. 1 to 2. The computer storage medium may also be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It should be noted that each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, or all or part of the technical solution that contributes to the prior art.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
While the disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims (8)

1. A machine learning workflow scheduling method based on a directed acyclic graph is characterized by comprising the following steps:
s1, packaging the components by adopting component packaging templates corresponding to component categories, wherein each component at least corresponds to one machine learning task;
s2, acquiring machine learning tasks and task sequences contained in the machine learning workflow, and connecting components corresponding to the machine learning tasks according to the task sequences;
s3, generating a directed acyclic workflow diagram according to the connected components, and executing a machine learning workflow according to the directed acyclic workflow diagram.
2. The directed acyclic graph-based machine learning workflow scheduling method of claim 1, wherein said encapsulating the component with a component encapsulation template corresponding to a component class in S1 comprises:
and packaging the components by adopting a component packaging template corresponding to the component category, and exposing a data input interface, a data output interface and a component parameter interface of the components for correspondingly and sequentially connecting each component.
3. The directed acyclic graph-based machine learning workflow scheduling method of claim 1, further comprising:
S2I, after the components corresponding to the machine learning task are connected according to the task sequence, checking whether the two components connected in front and back accord with the connection specification according to the normative input type, the input quantity, the output type and the output quantity of the components; if yes, performing step S3; otherwise, rejecting the connecting component which does not conform to the connection specification and reporting an error prompt, and repeatedly executing the steps S1-S2I until the component connection conforms to the connection specification and executing the step S3.
4. The directed acyclic graph-based machine learning workflow scheduling method of claim 1, wherein said S3 comprises:
s31, numbering the connected components, generating corresponding connection tuples according to the connection relations of the components, and marking the connection directions of the connection tuples, wherein each connection tuple comprises two components with connection relations;
s32, generating a directed acyclic workflow diagram based on the connection tuples, and executing the machine learning task flow.
5. The method according to claim 4, wherein the numbers and the connection tuples in the S31 are unique, so that the component indexes are unique.
6. The method according to claim 3, wherein the error notification manner in S2I is a pop-box notification of connection error information of the component in the machine learning task flow.
7. A directed acyclic graph-based machine learning workflow scheduling system, comprising:
the component packaging module is used for packaging the components by using component packaging templates corresponding to component categories, wherein each component at least corresponds to one machine learning task;
the component connection module is used for acquiring machine learning tasks and task sequences contained in the machine learning workflow and connecting components corresponding to the machine learning tasks according to the task sequences;
and the directed acyclic workflow graph building module is used for generating a directed acyclic workflow graph according to the connected components and executing the machine learning workflow according to the directed acyclic workflow graph.
8. The directed acyclic graph-based machine learning workflow scheduling system of claim 7, further comprising:
and the component connection standard checking module is used for checking the connection standard of the front and back connected components in the connected components output by the component connection module according to the normative input type, the input number, the output type and the output number of the components.
CN202011490334.3A 2020-12-16 2020-12-16 Machine learning workflow scheduling method and system based on directed acyclic graph Active CN112558938B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011490334.3A CN112558938B (en) 2020-12-16 2020-12-16 Machine learning workflow scheduling method and system based on directed acyclic graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011490334.3A CN112558938B (en) 2020-12-16 2020-12-16 Machine learning workflow scheduling method and system based on directed acyclic graph

Publications (2)

Publication Number Publication Date
CN112558938A true CN112558938A (en) 2021-03-26
CN112558938B CN112558938B (en) 2021-11-09

Family

ID=75064283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011490334.3A Active CN112558938B (en) 2020-12-16 2020-12-16 Machine learning workflow scheduling method and system based on directed acyclic graph

Country Status (1)

Country Link
CN (1) CN112558938B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893509A (en) * 2016-03-30 2016-08-24 电子科技大学 Marking and explaining system and method for large-data analysis model
WO2017146816A1 (en) * 2016-02-26 2017-08-31 Google Inc. Compiler techniques for mapping program code to a high performance, power efficient, programmable image processing hardware platform
CN107450972A (en) * 2017-07-04 2017-12-08 阿里巴巴集团控股有限公司 A kind of dispatching method, device and electronic equipment
CN108985367A (en) * 2018-07-06 2018-12-11 中国科学院计算技术研究所 Computing engines selection method and more computing engines platforms based on this method
CN110728371A (en) * 2019-09-17 2020-01-24 第四范式(北京)技术有限公司 System, method and electronic device for executing automatic machine learning scheme
CN110956272A (en) * 2019-11-01 2020-04-03 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN111124387A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Modeling system, method, computer device and storage medium for machine learning platform
CN111310936A (en) * 2020-04-15 2020-06-19 光际科技(上海)有限公司 Machine learning training construction method, platform, device, equipment and storage medium
CN111488211A (en) * 2020-04-09 2020-08-04 北京嘀嘀无限科技发展有限公司 Task processing method, device, equipment and medium based on deep learning framework

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017146816A1 (en) * 2016-02-26 2017-08-31 Google Inc. Compiler techniques for mapping program code to a high performance, power efficient, programmable image processing hardware platform
CN108541321A (en) * 2016-02-26 2018-09-14 谷歌有限责任公司 Program code is mapped to the technique of compiling of the programmable graphics processing hardware platform of high-performance, high effect
CN105893509A (en) * 2016-03-30 2016-08-24 电子科技大学 Marking and explaining system and method for large-data analysis model
CN107450972A (en) * 2017-07-04 2017-12-08 阿里巴巴集团控股有限公司 A kind of dispatching method, device and electronic equipment
CN108985367A (en) * 2018-07-06 2018-12-11 中国科学院计算技术研究所 Computing engines selection method and more computing engines platforms based on this method
CN111124387A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Modeling system, method, computer device and storage medium for machine learning platform
CN110728371A (en) * 2019-09-17 2020-01-24 第四范式(北京)技术有限公司 System, method and electronic device for executing automatic machine learning scheme
CN110956272A (en) * 2019-11-01 2020-04-03 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN111488211A (en) * 2020-04-09 2020-08-04 北京嘀嘀无限科技发展有限公司 Task processing method, device, equipment and medium based on deep learning framework
CN111310936A (en) * 2020-04-15 2020-06-19 光际科技(上海)有限公司 Machine learning training construction method, platform, device, equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CARL WITT等: ""Predictive performance modeling for distributed batch processing using black box monitoring and machine learning"", 《INFORMATION SYSTEMS》 *
RICHARDLEEH: ""中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷"", 《HTTPS://BLOG.CSDN.NET/LIHUINIHAO/ARTICLE/DETAILS/73175856》 *
TIANYOU GUO等: ""Ease the Process of Machine Learning with Dataflow"", 《CIKM "16: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL ON CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》 *
杨成伟: ""云计算环境下动态流程优化调度问题研究"", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 *
赵玲玲 等: ""基于Spark的流程化机器学习分析方法"", 《计算机系统应用》 *

Also Published As

Publication number Publication date
CN112558938B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
González et al. Atltest: A white-box test generation approach for ATL transformations
CN102598001B (en) Techniques and system for analysis of logic designs with transient logic
US20130145347A1 (en) Automatic modularization of source code
EP3432229A1 (en) Ability imparting data generation device
CN109857641A (en) The method and device of defects detection is carried out to program source file
WO2021036697A1 (en) Commissioning method and apparatus, device and storage medium
CN110297760A (en) Building method, device, equipment and the computer readable storage medium of test data
JP2022540871A (en) A technique for visualizing the behavior of neural networks
JP2009054147A (en) Method, system, and computer program product for generating automated assumption for compositional verification
US10970449B2 (en) Learning framework for software-hardware model generation and verification
González et al. Test data generation for model transformations combining partition and constraint analysis
Nicholson et al. Automated verification of design patterns: A case study
US20210311729A1 (en) Code review system
Johansson Lemma discovery for induction: a survey
US11610134B2 (en) Techniques for defining and executing program code specifying neural network architectures
CN112783513B (en) Code risk checking method, device and equipment
US20110265050A1 (en) Representing binary code as a circuit
US12001823B2 (en) Systems and methods for building and deploying machine learning applications
CN112558938B (en) Machine learning workflow scheduling method and system based on directed acyclic graph
Dechsupa et al. An automated framework for BPMN model verification achieving branch coverage
Kavitha et al. Explainable AI for Detecting Fissures on Concrete Surfaces Using Transfer Learning
Karwa et al. Android based application for fruit quality analysis
CN113221126B (en) TensorFlow program vulnerability detection method and device and electronic equipment
EP4280120A1 (en) Method for processing decision data, device and computer program corresponding
Phawade et al. Bounded model checking for unbounded client server systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant