CN113505895A

CN113505895A - Machine learning engine service system, model training method and configuration method

Info

Publication number: CN113505895A
Application number: CN202110897441.6A
Authority: CN
Inventors: 程战战
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-10-15
Anticipated expiration: 2041-08-05
Also published as: CN113505895B

Abstract

The embodiment of the invention provides a machine learning engine service system, a model training method and a configuration method. And then based on the target configuration information and according to a target training process, calling a model training engine to train the model based on the target data set. According to the system provided by the embodiment of the invention, the configuration information of the task model comprises the relevant information required to be called when the task model is trained, and when a user uses the task model, the corresponding configuration file can be directly called to train the model to obtain a desired result without code migration, so that the threshold of machine learning is reduced and the robustness of machine learning is improved.

Description

Machine learning engine service system, model training method and configuration method

Technical Field

The invention relates to the technical field of machine learning, in particular to a machine learning engine service system, a model training method and a configuration method.

Background

Currently, machine learning techniques are widely applied in various technical fields, such as: video monitoring, behavior analysis, image processing and the like.

In order to implement machine learning, in the related art, some deep learning frameworks are provided, such as: caffe (Convolutional structure for Fast Feature Embedding), TensorFlow, Pyorch, and so on.

By applying the deep learning frames, for common machine learning models, such as neural network models, research personnel do not need to code from a complex neural network, existing models can be selected according to needs, model parameters can be obtained through training, and training can be carried out after layers of the existing models are added.

However, when a new algorithm needs to be used, a developer needs to write new algorithm code for the new algorithm and then call the function interface of the deep learning framework to use the new algorithm customized by the developer.

With the rapid development of artificial intelligence technology, advanced machine learning based algorithms have emerged, such as: semi-supervised learning, active learning and the like. The advanced machine learning algorithm is defined as: the machine learning idea or paradigm is usually realized by combining a plurality of sub-algorithms, and the sub-algorithm used by each high-level machine learning algorithm is not fixed and can be selected by developers according to actual requirements.

In actual algorithm development, for each advanced machine learning algorithm, research and development personnel often perform single-point technology research and development on a single sub-algorithm to be used, and then splice a plurality of sub-algorithms together in a code migration mode, so as to define a new algorithm.

In fact, some single sub-algorithms may be applied to multiple advanced machine learning algorithms. In this case, the individual sub-algorithms still need to be migrated once for each advanced machine learning algorithm, which is very labor intensive.

Therefore, for the advanced machine learning algorithm, a deep learning framework in the related technology cannot be directly applied, research personnel need to perform single-point technology research and development of a single sub-algorithm and perform code migration, and the robustness is poor.

Disclosure of Invention

The embodiment of the invention aims to provide a machine learning engine service system and a model training method so as to improve the robustness of machine learning. The specific technical scheme is as follows:

in an aspect of an embodiment of the present invention, there is provided a machine learning engine service system, including:

the system comprises a model management module, a data management module, a model training engine, a configuration information storage module and a model information storage module;

the model training engine is used for training a network model;

the configuration information storage module is used for storing the configuration information of each task model provided by the system; the configuration information of each task model comprises: structural information of a network model used by the task model, hyper-parameters of the network model, and configuration information of a data set used by the task model;

the model information storage module is used for storing model data of each network model provided by the system; the model data of the network model comprises program code of the network model;

the model management module is used for acquiring target configuration information of a corresponding target task model from the configuration information storage module based on a target task determined by a user, and constructing and loading the target task model based on the target configuration information and model data of a network model stored in the model information storage module; calling the data management module to load a target data set; calling the model training engine according to a preset target training flow of the target task, and training the target task model based on the target data set; the target training process comprises: a sequence of training the target network model in the target configuration information;

and the data management module is used for loading the target data set based on the configuration information of the target data set in the target configuration information when the data management module is called by the model management module.

In one embodiment of the invention, the system further comprises: the system comprises a policy information storage module and a policy management module;

in the configuration information stored in the configuration information storage module, the configuration information of at least one task model further includes: policy configuration information; the policy configuration information includes: the task model needs to call at least one strategy mark and strategy parameter configuration information of each strategy; one strategy identification corresponds to one sub-algorithm in the advanced machine learning algorithm;

the strategy information storage module is used for storing the program code of each strategy provided by the system;

the model management module is specifically configured to obtain target configuration information of a corresponding target task model from the configuration information storage module based on a target task determined by a user, and construct and load the target task model based on the target configuration information and model data of a network model stored in the model information storage module; calling the data management module to load a target data set;

under the condition that the configuration information of the target task model comprises strategy configuration information, calling the model training engine and the strategy management module according to a preset target training flow of the target task, and training the target task model based on the target data set; the target training process comprises: training a target network model in the target configuration information, and loading and executing a target strategy in the target configuration information;

and the policy management module is used for loading and executing the target policy based on the program code of the policy stored in the policy information storage module and the policy configuration information of the target task model when the policy management module is called, and feeding back a policy execution result to the model management module.

In one embodiment of the invention, the system further comprises: each training interface corresponds to one task model, and each training interface comprises a training process of the corresponding task model;

the model management module calls the model training engine and the strategy management module according to a preset target training flow of the target task under the condition that the configuration information of the target task model contains strategy information, and trains the target task model based on the target data set, and the model management module comprises: determining a target training interface corresponding to the target task, calling the target training interface to call the model training engine and the strategy management module according to a preset target training flow of the target task, and training the target task model based on the target data set.

In one embodiment of the invention, the system further comprises: a model configuration module;

the model configuration module is used for acquiring configuration information of a research and development user for configuring a task model and storing the configuration information into the configuration information storage module;

the model information storage module is also used for storing model data of a new network model imported by a research and development user;

the strategy information storage module is also used for storing program codes of new strategies introduced by research and development users.

In one embodiment of the invention, the system further comprises: a data output module;

and the data output module is used for outputting various data to be output according to a preset unified output format.

In one embodiment of the invention, the system further comprises: one or more of a script presentation module, a tool module, and a log analysis module:

the script display module is used for displaying a script example of the process of the model management module for training the target task model to a user based on a display instruction triggered by a user and/or a research and development user;

the tool module is used for providing auxiliary tools for users and/or research and development users; the auxiliary tool at least comprises: the characteristic visualization display tool is used for displaying characteristic change and/or characteristic distribution in the training process of the target task model;

and the log analysis module is used for analyzing the log data generated by the model management module in the process of training the target task model.

In a second aspect of the embodiments of the present invention, there is further provided a method for training a task model, which is applied to the machine learning engine service system, where the method includes:

the model management module is used for obtaining a target task determined by a user;

acquiring target configuration information of a corresponding target task model from the configuration information storage module;

constructing and loading the target task model based on the target configuration information and model data of the network model stored in the model information storage module; the model data of the network model comprises program code of the network model;

calling the data management module, and loading a target data set based on the configuration information of the target data set in the target configuration information;

calling the model training engine according to a preset training flow of the target task, and training the target task model based on the target data set; wherein, the training process comprises: and training the target network model in the target configuration information.

in the configuration information stored in the configuration information storage module, the configuration information of at least one task model further includes: policy configuration information; the policy configuration information includes: the task model needs to call at least one strategy mark and strategy parameter configuration information of each strategy; wherein one strategy identification corresponds to one sub-algorithm in the high-level machine learning algorithm;

the policy management module is configured to, when being called by the model management module, load and execute the target policy based on the program code of the policy stored in the policy information storage module and the policy configuration information of the target task model, and feed back a policy execution result to the model management module;

the step of calling the model training engine according to a preset training flow of the target task and training the target task model based on the target data set comprises the following steps:

under the condition that the configuration information of the target task model contains strategy information, calling the model training engine and the strategy management module according to a preset target training flow of the target task, and training the target task model based on the target data set; the target training process comprises: training a target network model in the target configuration information, and loading and executing a target strategy in the target configuration information.

under the condition that the configuration information of the target task model contains strategy information, calling the model training engine and the strategy management module according to a preset target training flow of the target task, and training the target task model based on the target data set, wherein the step comprises the following steps of:

determining a target training interface corresponding to the target task, calling the target training interface to call the model training engine and the strategy management module according to a preset target training flow of the target task, and training the target task model based on the target data set.

In an embodiment of the present invention, when the configuration information of the target task model includes policy information, the step of calling the model training engine and the policy management module according to a preset training flow of the target task, and training the target task model based on the target data set includes:

determining a current training flow step to be executed based on a preset training flow of the target task;

under the condition that the current training flow step is a step of executing a target strategy, calling the strategy management module, obtaining a program code of the target strategy from the strategy information storage module for loading, and executing the target strategy in the current training flow step;

receiving a strategy execution result fed back by the strategy management module;

when the current training process step is to train a target network model in the target configuration information, calling the model training engine, and training the target network model based on the target data set;

receiving a training result fed back by the model training engine;

and returning to the training process based on the preset target task and determining the step of the training process to be executed under the condition that the current training process step is not the last training process in the training process.

In one embodiment of the invention, the method further comprises:

and outputting various data to be output according to a preset unified output format.

In a third aspect of the embodiments of the present invention, there is further provided a method for configuring a task model, which is applied to the machine learning engine service system, and the method includes:

receiving model data of each network model imported by a research and development user, and storing the model data into the model information storage module; the model data of the network model comprises program code of the network model;

acquiring configuration information for configuring each task model by a research and development user, and storing the configuration information into the configuration information storage module; the configuration information of each task model comprises: the configuration information of the network model used by the task model, the hyper-parameters of the network model and the configuration information of the data set used by the task model.

the method further comprises the following steps:

receiving program codes of all strategies imported by a research and development user and storing the program codes to the strategy information storage module;

obtaining policy configuration information configured by a research and development user for at least one task model, and storing the policy configuration information into a configuration information storage module, wherein the policy configuration information comprises: the task model needs to call at least one strategy mark and strategy parameter configuration information of each strategy; one strategy identification corresponds to one sub-algorithm in the advanced machine learning algorithm;

In another aspect of the embodiments of the present invention, an electronic device is provided, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the training or configuration method of any one of the task models when executing the program stored in the memory.

The embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for training or configuring the task model described above is implemented.

Embodiments of the present invention also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the above methods for training or configuring a task model.

The embodiment of the invention has the following beneficial effects:

in the machine learning engine service system provided by the embodiment of the invention, the model management module acquires corresponding target configuration information from the configuration information storage module based on the target task determined by the user, and the task model configuration information of each task comprises the structural information of the network model used by the task model and the hyper-parameters thereof. And then, according to the target configuration information and the model data of the network model stored in the model information storage module, constructing and loading a target task model, calling a model training engine according to a target training flow of the target task, and training the target task model based on a target data set loaded by the data management module. In the machine learning engine service system provided by the embodiment of the invention, the configuration information of the task model comprises the relevant information required to be called when the task model is trained, and when a user uses the task model, the user can directly call the corresponding configuration file to train the model to obtain a desired result without code migration, so that the machine learning threshold is reduced, and the machine learning robustness is improved.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by referring to these drawings.

Fig. 1 is a schematic structural diagram of a machine learning engine service system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a machine learning engine service system according to a second embodiment of the present invention;

fig. 3a is a schematic diagram of a third structure of a machine learning engine service system according to an embodiment of the present invention;

fig. 3b is a schematic diagram of an external interface of the machine learning engine service system according to the embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a fourth structure of a machine learning engine service system according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a fifth structure of a machine learning engine service system according to an embodiment of the present invention;

fig. 6a is an architecture diagram of a machine learning engine service system according to an embodiment of the present invention;

FIG. 6b is a schematic view of a variation of the Loss curve;

fig. 7 is a schematic diagram of a training process of a machine learning engine service system according to an embodiment of the present invention;

FIG. 8 is a flowchart of a task model training method according to an embodiment of the present invention;

FIG. 9 is a second flowchart of a task model training method according to an embodiment of the present invention;

FIG. 10 is a flowchart illustrating a task model training method for detailed training according to an embodiment of the present invention;

FIG. 11 is a flowchart of a task model configuration method according to an embodiment of the present invention;

FIG. 12 is a second flowchart of a task model configuration method according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to improve robustness of machine learning, the embodiment of the invention provides a machine learning engine service system, a model training method and a configuration method. First, a machine learning engine service system provided by an embodiment of the present invention is described in detail below.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a machine learning engine service system provided by an embodiment of the present invention, where the system may include: a model management module 101, a configuration information storage module 102, a data management module 103, a model information storage module 104, and a model training engine 105;

the model training engine 105 is used for training a network model;

the configuration information storage module 102 is configured to store configuration information of each task model provided by the system; the configuration information of each task model comprises: structural information of a network model used by the task model, hyper-parameters of the network model, and configuration information of a data set used by the task model;

the model information storage module 104 is configured to store model data of each network model provided by the system; the model data of the network model comprises program code of the network model;

the model management module 101 is configured to obtain target configuration information of a corresponding target task model from the configuration information storage module 102 based on a target task determined by a user, and construct and load the target task model based on the target configuration information and model data of a network model stored in the model information storage module 104; and calling the data management module 103 to load a target data set; calling the model training engine 105 according to a preset target training flow of the target task, and training the target task model based on the target data set; the target training process comprises: a sequence of training the target network model in the target configuration information;

the data management module 103 is configured to load a target data set based on configuration information of the target data set in the target configuration information when called by the model management module 101.

In the embodiment of the present invention, a user may select a task provided by the machine learning engine service system to meet a use requirement of the user, where the task may include classification, detection, and the like, and the corresponding task model may be a classifier model, a detector model, and the like, which is not specifically limited herein. Different task model configuration information can be configured in advance for different tasks and stored in the configuration information storage module.

As a specific implementation manner of the embodiment of the present invention, the configuration information of each task model may be stored in a configuration file. The configuration file may be written by a developer in a development process, and of course, when the user uses the machine learning engine service system, the configuration file in the system configuration information storage module may also be rewritten to meet the use requirement of the user.

As described above, the model information storage module stores model data of each network model provided by the system, the network model may include a ResNet network, an RPN network, an RCNN network, and the like, and the network model data of each network model refers to a specific program code of the network model. The task models may include one or more of the network models, and accordingly, the configuration information of the task model may include a structure and a hyper-parameter of the network model used by the task model, for example, network structure information such as the number of network layers, the number of input channels, and the number of output channels of the network model, and a hyper-parameter such as a learning rate.

As a specific implementation of the embodiment of the present invention, based on fig. 1, as shown in fig. 2, the system may further include: a policy management module 206 and a policy information storage module 207;

in the configuration information stored in the configuration information storage module 102, the configuration information of at least one task model further includes: policy configuration information; the policy configuration information includes: the task model needs to call at least one strategy mark and strategy parameter configuration information of each strategy; one strategy identification corresponds to one sub-algorithm in the advanced machine learning algorithm;

the policy information storage module 207 is configured to store program codes of each policy provided by the system;

the model management module 101 is specifically configured to obtain target configuration information of a corresponding target task model from the configuration information storage module 102 based on a target task determined by a user, and construct and load the target task model based on the target configuration information and model data of a network model stored in the model information storage module 104; and calling the data management module 103 to load a target data set;

under the condition that the configuration information of the target task model comprises strategy configuration information, calling the model training engine 105 and the strategy management module 206 according to a preset target training flow of the target task, and training the target task model based on the target data set; the target training process comprises: training a target network model in the target configuration information, and loading and executing a target strategy in the target configuration information;

the policy management module 206 is configured to, when called by the model management module, load and execute the target policy based on the program code of the policy stored in the policy information storage module 207 and the policy configuration information of the target task model, and feed back a policy execution result to the model management module 101.

As described above, the policy information storage module stores program codes of each policy provided by the system, in the embodiment of the present invention, a policy is a specific sub-algorithm in a high-level machine learning algorithm, such as an SVM algorithm, a KNN algorithm, an augmented divergence method (augmented divergence), a minimum Confidence level (Least Confidence) and the like, and the high-level machine learning algorithm refers to a machine learning idea or paradigm and is generally implemented by combining a plurality of sub-algorithms. Each task model may include one or more of the sub-algorithms (policies), and correspondingly, the configuration information of the task model may further include policy configuration information, the policy configuration information may include identifiers of the policies and policy parameter configuration information, the policy identifiers correspond to the policies one to one, when a policy is called based on a configuration file, a specific policy may be called based on the policy identifiers, and the policy parameter configuration information is a specific implementation process of the sub-algorithms. As a specific implementation manner of the embodiment of the present invention, a policy identifier may be generated for a sub-algorithm (policy), and the policy identifier is added to a configuration file when the configuration file is written.

Therefore, each strategy can be called by different task models, the reusability of each sub-algorithm is improved, and the cost of algorithm development is reduced.

As described above, the configuration information of each task model includes configuration information of a network model used by the task model, a hyper-parameter of the network model, a policy identifier of a policy to be invoked by the task model, and policy parameter configuration information, and the configuration information may further include data set configuration information used by the corresponding task model. The data management module may perform operations such as building, dividing, and the like on the data set based on the data set configuration information, and may also perform operations such as adding, deleting, and the like on the data set, for example, the data set may be divided into a training set and a test set, and results of the operations such as building, dividing, and the like are loaded for use in a model training process. As a specific implementation manner of the embodiment of the present invention, the data management module may convert the data set into a unified format, such as a Json format. The data management module may include an incremental dataset management (IncDatasetManager) sub-module, an active learning dataset management (ALDatasetManager) sub-module, and the like.

In the embodiment of the present invention, after the model management module in the machine learning engine service system obtains the target task selected by the user, the model management module in the machine learning engine service system may obtain corresponding target configuration information from the configuration information storage module, construct a target task model according to the target configuration information, call the model training engine and the policy management module, and train the target task model based on the target data set loaded by the data management module. As a specific implementation manner, the target data set may include a training set and a test set, and the model management module may train the task model on the training set, evaluate the task model on the test set, and finally output a trained model file and an evaluation result. In the embodiment of the present invention, the model training engine may be Caffe, tensrflow, pyrrch, or the like.

In an embodiment of the present invention, the model management module may include a semi-supervised detector management (semidetectermanager) sub-module, an active learning detector management (aldetectermanager) sub-module, an active learning classifier management (alclasifiermanager) sub-module, and the like.

As a specific implementation manner of the embodiment of the present invention, when performing model training, the model training may be performed according to a preset target training procedure, where the target training procedure may include a network model training sequence in the target configuration information, and as described above, in the case that the configuration information of the target task model includes policy configuration information, the target training procedure may further include an execution sequence of a target policy.

Generally, the learning method used in model training may mainly include active learning, semi-supervised learning, and incremental learning, and for each learning method, the learning method may be abstracted into an operation flow of a plurality of atomic algorithms according to an algorithm idea thereof, for example, for semi-supervised learning, generally, non-tag data is obtained, the non-tag data is input into a trained teacher model to obtain a tag corresponding to output, and at the same time, the non-tag data is input into a model to be trained (student model) to obtain a pseudo tag corresponding to output, and then consistency training is performed based on the tag output by the teacher model and the pseudo tag output by the student model, and the above process may be abstracted to be implemented by the following procedures: obtaining label-free data, marking pseudo labels on the label-free data, and performing consistency training based on the pseudo labels. The developer can perform development optimization on each link in the operation flow, and the target training process is a result of the developer performing development optimization on each link in the operation flow.

As a specific implementation manner of the embodiment of the present invention, based on fig. 2, as shown in fig. 3a, fig. 3a is a third schematic structural diagram of a machine learning engine service system provided in the embodiment of the present invention; the system may further include a plurality of training interfaces 304, where each of the training interfaces corresponds to a task model, and each training interface includes a training procedure of the corresponding task model;

therefore, when the configuration information of the target task model includes policy information, the model management module 101 may call the model training engine 105 and the policy management module 206 according to a preset target training flow of the target task, and train the target task model based on the target training set, by: determining a target training interface corresponding to the target task, calling the target training interface to perform training on the target task model based on the target data set according to a preset target training flow of the target task, and calling the model training engine 105 and the policy management module 206.

That is to say, as a specific implementation manner of the embodiment of the present invention, after the user selects the target task, the machine learning engine service system may call the model training engine and the policy management module through the training interface corresponding to the target task, and acquire the model data of the network model included in the corresponding task model from the model information storage module 104 to train the target task model. As shown in fig. 3b, the training interface (interface layer) may include an incremental interface, an active interface, a semi-supervised interface, and the like, and a user (user layer) may select a target task, that is, determine corresponding target configuration information, a target task model, and a target data set, and then invoke an engine service (algorithm layer) through the corresponding interface to train the target task model, where the engine service is specifically executed by the model training engine and the policy management module.

As described above, the machine learning engine service system provided in the embodiment of the present invention enables a user to autonomously select a target task, and train a model according to the target task. On the other hand, the machine learning engine service system can also be oriented to developer users (research and development users), and the research and development users can complete the perfection and the upgrade of the formulated algorithm idea based on the system.

Correspondingly, as a specific implementation manner of the embodiment of the present invention, based on fig. 2, as shown in fig. 4, the machine learning engine service system may further include a model configuration module 408;

the model configuration module 408 is configured to obtain configuration information for configuring the task model by a research and development user, and store the configuration information into the configuration information storage module;

research personnel can configure configuration information of different task models according to different task requirements, and in each task model configuration information, each task model configuration information can contain one or more network model information, one or more strategy information and data set configuration information.

The model information storage module 104 is further configured to store model data of a new network model imported by a research and development user;

the policy information storage module 207 is further configured to store program codes of new policies introduced by the development user.

If the network model and the policy used by the research and development user in configuring the configuration information of each task model are stored in the model information storage module and the policy information storage module, it is not necessary to introduce a new network model or a new policy program code. If the task model is not only the stored network model and/or policy, the developer is required to import new network model and/or policy program code into the corresponding storage module.

As a specific implementation manner of the embodiment of the present invention, based on fig. 2, as shown in fig. 5, the machine learning engine service system further includes a data output module 508;

the data output module 508 is configured to output various data to be output according to a preset unified output format.

As a specific implementation manner of the embodiment of the present invention, the data to be output may include a trained model, an operation result list, and the like. If the model training engine is a pytorch, the trained model may be output in a pth file, the operation result list may be output in a html format, and the data output module may package the output data into a unified Json format for output.

As a specific implementation manner of the embodiment of the present invention, as shown in fig. 6a, the machine learning engine service system 600 may further include one or more of a script presentation module 608, a tool module 609, and a log analysis module 610, in addition to the model management module 601, the configuration information storage module 602, the data management module 603, the model information storage module 604, the model training engine 605, the policy management module 606, and the policy information storage module 607:

the script display module 608 is configured to display, to the user, a script instance of the process in which the model management module performs the target task model training based on a display instruction triggered by the user and/or the research and development user;

as a specific implementation manner of the embodiment of the present invention, the user or the research and development user may trigger the display instruction by clicking a "display" button, and the like, which is not limited herein. After receiving the display instruction, the system can display the model training process to the user, for example, the specific running process of the training code can be displayed to the user.

The tool module 609 is used for providing auxiliary tools for users and/or research and development users; the auxiliary tool at least comprises: the characteristic visualization display tool is used for displaying characteristic change and/or characteristic distribution in the training process of the target task model;

as a specific implementation manner of the embodiment of the present invention, feature visualization may be performed in a training process, and the feature visualization may include multiple manners, such as Loss value visualization, visualization of a single/multiple channel feature map, and the like, as shown in fig. 6b, fig. 6b is a Loss curve showing a variation of a test Loss and a training Loss with the number of iterations, and the model training process may be visually displayed to a user. The auxiliary tool may further include a test tool, etc., which is not limited herein.

The log analysis module 610 is configured to analyze log data generated by the model management module in the process of training the target task model.

The log data can record various operation data in the process of training the target task model, and a user can analyze abnormal parts in the model training process according to the log data and modify and optimize the abnormal parts.

As shown in fig. 7, fig. 7 is a schematic diagram of a training process of a machine learning engine service system according to an embodiment of the present invention, which shows a semi-supervised detection method based on YOLO, and the method includes first reading a target configuration file solution/xx _ solution.py through a config.fryfile instruction, then calling a model manager model _ manager, and when constructing a detector according to a network model structure configuration file model/builder.py in the target configuration file, using an _ init _ function to perform model initialization according to a detector model/detectors/YOLO _ v3.py model configuration file, and constructing a detector (built _ detector), a back _ back network (built _ back), a buck network (built _ back), a head network (built _ back), and a head network (built _ head), where the back network is a backbone network of a task model, and in a network content of a feature extraction network is a network output network, the method can make a prediction by utilizing the characteristics extracted before, and a nack network is placed between a backbone network and a head network so as to better utilize the characteristics extracted by the backbone network.

The data manager (dataset _ manager) may construct a dataset based on the dataset configuration information (dataset/builder. py) in the target configuration file, and the policy manager (policy _ manager) may construct a policy based on the target policy configuration information (policy/builder. py) in the target configuration file, that is, call the policy saved in the policy information storage module. And the training model interface (train _ model) can load a target data set (constructing a DataLoader), call a Runner (model training engine) and register a HOOK interface by using a _ non _ dist _ train () function based on the training interface configuration information (api/train).

Then based on the model training engine configuration information (runner/runner () method), through runner/runner () method, configuring the operation interface of the model training engine, and through iteration training of runner/runner () method, after a round of training, using model Forward algorithm and calculating error based on the batch processor, specifically, based on model/detector/base () method, using Forward () method to realize model Forward, and then based on model/detectors/tw _ stage () method, calling model training Forward algorithm through Forward _ runner () method to calculate error, so, as described above, in semi-supervised learning algorithm, teacher model and student model generally exist, and during training, consistent training is performed on the teacher model and the student model output label, therefore, the calculated error may be the error between the teacher model output label and the student model output pseudo label, then, the hook interface (runner _ hook ('after _ train _ iter')) is called, the error calculation result is sent to an Optimizer (Optimizer), the Optimizer hook interface (Optimizer _ hook/Optimizer () in the model training engine) is called to adjust the model parameters through a gradient back propagation algorithm based on the Optimizer interface configuration (runner/hook/Optimizer) of the model training engine, and the step of iterative training through the runner _ train () method is returned until the model meets a preset standard, wherein the preset standard can be that the error is smaller than a preset threshold.

It can be seen from the foregoing embodiments that the machine learning engine service system provided by the embodiments of the present invention can be directly invoked without code migration when invoking each policy, thereby reducing the threshold of machine learning and improving the robustness of machine learning.

Based on the same technical concept as the machine learning engine service system, the embodiment of the present invention further provides a training method of a task model, which is applied to the machine learning engine service system, as shown in fig. 8, where fig. 8 is a flowchart of the training method of the model provided by the embodiment of the present invention; the method may specifically comprise the steps of:

step 801, a model management module obtains a target task determined by a user;

step 802, obtaining target configuration information of a corresponding target task model from the configuration information storage module;

step 803, based on the target configuration information and the model data of the network model stored in the model information storage module, constructing and loading the target task model; the model data of the network model comprises program code of the network model;

step 804, the data management module is called, and a target data set is loaded based on the configuration information of the target data set in the target configuration information;

step 805, calling the model training engine according to a preset training flow of the target task, and training the target task model based on the target data set; wherein, the training process comprises: and training the target network model in the target configuration information.

As a specific implementation manner of the embodiment of the present invention, as described above, the system may further include: the system comprises a policy information storage module and a policy management module;

correspondingly, based on fig. 8, as shown in fig. 9, the step 805 may specifically include the following steps:

step 905, under the condition that the configuration information of the target task model contains strategy information, calling the model training engine and the strategy management module according to a preset target training flow of the target task, and training the target task model based on the target data set; the target training process comprises: training a target network model in the target configuration information, and loading and executing a target strategy in the target configuration information.

As a specific implementation manner of the embodiment of the present invention, as described above, the machine learning engine service system may further include a plurality of training interfaces, each training interface corresponds to one task model, and each training interface includes a training procedure of the corresponding task model: therefore, the step 905 may specifically include the following steps:

As a specific implementation manner of the embodiment of the present invention, as shown in fig. 10, the step 905 may specifically include the following steps:

1001, determining a current training flow step to be executed based on a preset training flow of the target task;

in an embodiment of the present invention, the current training process steps may include executing a target policy and training a target network model.

Step 1002, in the case that the current training flow step is a step of executing a target strategy, calling the strategy management module, obtaining a program code of the target strategy from the strategy information storage module for loading, and executing the target strategy in the current training flow step;

specifically, the policy storage module may obtain a program code of the target policy from the policy information storage code according to the policy identifier in the target configuration information and the configuration information of the policy, and load the program code, so that the program code is used for performing operations such as calculation and pseudo label printing in the model training process.

Step 1003, receiving a policy execution result fed back by the policy management module;

as a specific implementation manner of the embodiment of the present invention, the policy execution result fed back by the policy management module may include pseudo tag information of the sample, and the like.

Step 1004, when the current training process step is to train the target network model in the target configuration information, calling the model training engine, and training the target network model based on the target data set;

step 1005, receiving a training result fed back by the model training engine;

as a specific implementation manner of the embodiment of the present invention, model saving points (checkpoint) may be set in the configuration file, that is, snapshots of the saved model, for example, a write interval of checkpoint may be set to 20 minutes, and 10 checkpoint's that are the most recent are set, which is not specifically limited herein. The model training engine may return the model saved at checkpoint to the model management module for iterative training.

Step 1006, judging whether the current training flow step is the last one in the training flow, if not, returning to the training flow based on the preset target task, and determining the current training flow step to be executed; if yes, go to step 1007 to finish training.

After the training is finished, the trained model can be tested in the test set, and retraining or outputting results and the like are carried out according to the test results. As a specific implementation manner of the embodiment of the present invention, various data to be output may be output according to a preset unified output format.

The above training method of the task model has been described in detail in the foregoing system embodiments, and is only briefly described here, and is not described again.

The training method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, the model management module acquires corresponding target configuration information from the configuration information storage module based on a target task determined by a user, constructs and loads the target task model according to the target configuration information and model data of the network model stored in the model information storage module, calls the model training engine and the strategy management module according to a target training flow of the target task, and trains the target task model based on a target data set loaded by the data management module. According to the task model training method provided by the embodiment of the invention, the configuration information of the task model comprises the information of the strategies required to be called by the task model, and the program codes of the strategies are stored in the strategy information storage module, so that a user can directly call the corresponding strategies without code migration when using the task model, and thus, the machine learning threshold is reduced, and the machine learning robustness is improved.

Based on the same technical concept as the machine learning engine service system, an embodiment of the present invention further provides a method for configuring a task model, which is applied to the machine learning engine service system, and as shown in fig. 11, the method specifically includes the following steps:

step 1101, receiving model data of each network model imported by a research and development user, and storing the model data to the model information storage module; the model data of the network model comprises program code of the network model;

1102, acquiring configuration information for configuring each task model by a research and development user, and storing the configuration information into a configuration information storage module; the configuration information of each task model comprises: the configuration information of the network model used by the task model, the hyper-parameters of the network model and the configuration information of the data set used by the task model.

As an implementation manner of the embodiment of the present invention, as described above, the machine learning engine service system may further include a policy information storage module and a policy management module;

correspondingly, based on fig. 11, as shown in fig. 12, the method may further include the following steps:

step 1203, receiving program codes of all the strategies imported by the research and development user, and storing the program codes into the strategy information storage module;

step 1204, obtaining policy configuration information configured by a research and development user for at least one task model, and storing the policy configuration information into the configuration information storage module, wherein the policy configuration information includes: the task model needs to call at least one strategy mark and strategy parameter configuration information of each strategy; one strategy identification corresponds to one sub-algorithm in the advanced machine learning algorithm;

The above process has been described in detail in the foregoing system embodiments, and is not described herein again.

The configuration information of each task model comprises structure information, hyper-parameters and data set configuration information of the network model of a user, used by the task model, strategy identification of the strategy to be called and configuration information of each strategy, wherein one strategy identification corresponds to a sub-algorithm in high-level machine learning. According to the task model configuration method provided by the embodiment of the invention, the configuration information of each task model comprises the information of the strategy to be called by the task model, and the program codes of each strategy are stored in the strategy information storage module, so that a user can directly call the corresponding strategy when using the task model without code migration, and thus, the machine learning threshold is reduced and the machine learning robustness is improved.

An embodiment of the present invention further provides an electronic device, as shown in fig. 13, including a processor 1301, a communication interface 1302, a memory 1303, and a communication bus 1304, where the processor 1301, the communication interface 1302, and the memory 1303 complete mutual communication through the communication bus 1304,

a memory 1303 for storing a computer program;

the processor 1301 is configured to implement the following steps when executing the program stored in the memory 1303:

The training method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, the model management module acquires corresponding target configuration information from the configuration information storage module based on a target task determined by a user, constructs and loads the target task model according to the target configuration information and model data of the network model stored in the model information storage module, calls the model training engine according to a target training flow of the target task, and trains the target task model based on a target data set loaded by the data management module. According to the training method of the task model provided by the embodiment of the invention, the configuration information of the task model comprises the relevant information required to be called when the task model is trained, when a user uses the task model, the corresponding configuration file can be directly called to train the model to obtain a desired result, and code migration is not required, so that the threshold of machine learning is reduced, and the robustness of machine learning is improved.

The embodiment of the present invention further provides an electronic device, as shown in fig. 14, which includes a processor 1401, a communication interface 1402, a memory 1403, and a communication bus 1404, wherein the processor 1401, the communication interface 1402, and the memory 1403 complete communication with each other through the communication bus 1404,

a memory 1403 for storing a computer program;

the processor 1401, when executing the program stored in the memory 1403, implements the following steps:

The configuration method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, firstly receives model data of each network model imported by a research and development user, stores the model data into the model information storage module, obtains configuration information for configuring each task model by the research and development user, and stores the configuration information into the configuration information storage module, wherein the configuration information of each task model comprises structure information, hyper-parameters and data set configuration information used by the task model of the user network model. According to the configuration method of the task model provided by the embodiment of the invention, the configuration information of the task model comprises the relevant information required to be called when the task model is trained, when a user uses the task model, the corresponding configuration file can be directly called to train the model to obtain a desired result, and code migration is not required, so that the threshold of machine learning is reduced, and the robustness of machine learning is improved.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In a further embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned task model training or configuration methods.

In a further embodiment, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for training or configuring any of the above-described task models.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method and electronic device embodiments, since they are substantially similar to the system embodiments, the description is simple, and for the relevant points, reference may be made to partial description of the system embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A machine learning engine service system, the system comprising:

the model training engine is used for training a network model;

2. The system of claim 1, further comprising: the system comprises a policy information storage module and a policy management module;

3. The system of claim 2,

the system further comprises: each training interface corresponds to one task model, and each training interface comprises a training process of the corresponding task model;

4. The system of claim 2, further comprising: a model configuration module;

5. The system according to claim 1 or 2, characterized in that the system further comprises: a data output module;

6. The system of claim 1, further comprising: one or more of a script presentation module, a tool module, and a log analysis module:

7. A training method of task model, applied to the machine learning engine service system of claim 1, the method comprising:

8. The method of claim 7,

the system further comprises: the system comprises a policy information storage module and a policy management module;

9. The method of claim 8,

10. The method according to claim 8 or 9,

under the condition that the configuration information of the target task model contains strategy information, calling the model training engine and the strategy management module according to a preset training flow of the target task, and training the target task model based on the target data set, wherein the step comprises the following steps of:

receiving a training result fed back by the model training engine;

11. The method of claim 7, further comprising:

12. A method for configuring a task model, applied to the machine learning engine service system of claim 1, the method comprising:

13. The method of configuring of claim 12, wherein the system further comprises: the system comprises a policy information storage module and a policy management module;

the method further comprises the following steps:

14. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 7-11 or 12-13 when executing a program stored in the memory.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 7-11 or 12-13.