CN112529167B

CN112529167B - Neural network interactive automatic training system and method

Info

Publication number: CN112529167B
Application number: CN202011563536.6A
Authority: CN
Inventors: 罗明宇; 徐驰; 林健
Original assignee: Dongyun Ruilian Wuhan Computing Technology Co ltd
Current assignee: Dongyun Ruilian Wuhan Computing Technology Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2024-05-14
Anticipated expiration: 2040-12-25
Also published as: WO2022134600A1; CN112529167A

Abstract

The invention discloses a neural network interactive automatic training system and a method, wherein the system comprises a data storage component, a data labeling component, a model training component and a model deployment component; the data labeling component is used for labeling an original data set provided by a user and generating a labeling file of the original data set; the model training component is used for automatically training the neural network based on the annotation file and generating a trained model; the model deployment component is used for deploying the trained model into an online reasoning service; according to the invention, the user performs data annotation in the neural network interactive automatic training system, the annotation result can be used for model training, and the training result can be deployed as an online service by one key so as to realize a full-automatic process from original data to reasoning service.

Description

Neural network interactive automatic training system and method

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a neural network interactive automatic training system and method.

Background

The existing neural network automatic learning method is basically aimed at the aspect of neural network training, adopts methods of super parameter tuning, neural network architecture searching and the like to train on a general reference data set, and reports the evaluation results on the data sets. These methods are widely used in academic research, but have the following problems in practical production applications:

The automation range is limited. Most of the existing neural network automatic learning is aimed at training, and is often only carried out on a plurality of general reference data sets, because the data processing modes of different algorithms are not necessarily consistent, and the labeling formats of different data sets are different, so that the existing neural network automatic learning is difficult to apply to the data sets of the actual production environment. In addition, these automatic learning methods only generate trained model files, while the model deployment itself still requires some expertise, which further increases the difficulty of automatic learning algorithm application.

The dataset is fixed. The advent of the baseline dataset was intended to provide a relatively fair criteria for algorithm evaluation, but was not the ultimate goal of algorithm development. The existing neural network automatic learning algorithm is too dependent on some reference data sets, for example, some image classification architecture searching algorithms cannot migrate a network searched on CIFAR with smaller data size and lower image resolution to an ImageNet data set with larger data size and higher resolution, and vice versa. More importantly, the data is continuously variable as the generated business progresses, and the update period can be as small as one day or hour, which is much shorter than the training period (days/months) of the existing automatic learning method. The existing automatic learning method cannot timely utilize new data, and a model which takes more time to train may be separated from the actual generation environment requirement.

The training process is difficult to intervene. The current mainstream automatic learning method can be basically divided into super-parameter tuning and architecture searching, the super-parameter tuning is continuously tried to train by using a group of super-parameters through Bayes and other algorithms, and the super-parameter combination quality is modeled through super-parameter combination/training effects. Architecture search algorithms typically employ reinforcement learning or evolutionary algorithms that are trained repeatedly by continually adjusting the network structure. The training process of the method does not have overall progress, the super-parameter tuning and architecture searching can be carried out infinitely, and the breakpoint progress cannot be recovered like a single model after the training is interrupted. This is too black for production development to intervene.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention aims to provide a neural network interactive automatic training system and a neural network interactive automatic training method, which aim to solve the prior art mentioned in the background art.

The neural network interactive automatic training system comprises a data storage component, a data labeling component, a model training component and a model deployment component:

The data labeling component is used for labeling an original data set provided by a user and generating a labeling file of the original data set;

the model training component is used for automatically training the neural network based on the annotation file and generating a trained model;

The model deployment component is used for deploying the trained model into an online reasoning service;

The data storage component is used for storing the generated annotation file and the trained model.

In addition, in order to achieve the above purpose, the present invention also provides a neural network interactive automatic training method, which comprises the following steps:

calling a data labeling component to label an original data set provided by a user, generating a labeling file of the original data set, and storing the data labeling component in a data storage component;

automatically performing neural network training by a model training component according to the annotation file, generating a trained model, and inputting the trained model to the model deployment component;

and calling a model deployment component to deploy the trained model as an online reasoning service.

The invention has the beneficial effects that: the user carries out data annotation in the neural network interactive automatic training system, the annotation result can be used for model training, and the training result can be deployed into an online service by one key so as to realize the full-automatic flow from the original data to the reasoning service.

Drawings

FIG. 1 is a schematic diagram of the components of a neural network interactive automatic training system, according to an embodiment of the present invention;

FIG. 2 is a flow chart of a neural network interactive automatic training method according to an embodiment of the present invention;

FIG. 2-1 is a flow diagram relating to an embodiment of data annotation in accordance with the present invention;

FIG. 3 is a block diagram of an embodiment of model auto-training according to the present invention;

FIG. 3-1 is a flow chart of a method corresponding to an overall planning module in a model training assembly according to an embodiment of the present invention;

FIG. 3-2 is a flow chart of a method corresponding to a data detection module in a model training assembly according to an embodiment of the present invention;

3-3 are schematic flow diagrams of methods corresponding to model scheduling modules in a model training component according to embodiments of the present invention;

3-4 are schematic flow diagrams of methods corresponding to model training modules in a model training assembly according to embodiments of the present invention;

FIG. 4 is a flow diagram of a method corresponding to a model deployment component according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Description of the technical terms:

The neural network and training of the neural network is explained first.

The neural network algorithm is an algorithm for carrying out characterization learning on data by taking an artificial neural network as a framework.

The neural network architecture is the structure of the neural network parameters and the relation thereof, and the training of the neural network is that the neural network is iteratively trained by using a group of super parameters to obtain the estimated value of the neural network parameters.

Super parameters of the neural network include, but are not limited to, optimization algorithms of the neural network, learning rate, loss function, and the like.

The training process of the neural network uses data pairs of input data/output data in batches to carry out a afferent neural network algorithm, uses errors of predicted output and output data of the neural network as a loss function, carries out training by using an iterative method of gradient descent or quasi-gradient descent, uses the loss function descent threshold value or the change amplitude of the neural network parameters as a mark of ending the training, and finally uses the obtained neural network structure and parameter set as a neural network model.

The expert in the non-neural network field can automatically train out a model with better effect under the condition of a given data set through an automatic super-parameter tuning algorithm or an architecture search algorithm.

The automatic super-parameter tuning algorithm tries a plurality of groups of super-parameters based on a certain strategy, finds a group of super-parameters with the best effect under the condition that the neural network architecture is basically unchanged, and trains to obtain a corresponding neural network model; the architecture search algorithm is based on a certain strategy, and by continuously adjusting the architecture of the neural network, the optimal architecture of the neural network is found and trained to obtain a corresponding neural network model under the condition that the super parameters are basically unchanged.

The data pairs in neural network training are typically input data and labels, the data format of which is related to a particular task. For example, the input data of the image classification task is a picture, and is labeled as a class label corresponding to the picture. Before training the neural network, a plurality of pieces of sample data, each including one of the data pairs described above, need to be prepared. It will be appreciated that data preparation and labeling is a process independent of model training, but is a prerequisite for performing training.

The result of the neural network training is a neural network model. The neural network model is typically stored in a computer in a serialized file of neural network parameters and structures, the format of which is related to the implementation and framework of the neural network algorithm used. For example, a neural network model file trained using TensorFlow frames is typically a pb file. The process of predicting new data using a trained neural network model is referred to as reasoning. It is understood that model reasoning is a process independent of model training, but is a necessary process for model application to solve practical problems.

System embodiment of neural network interactive automatic training

A neural network interactive automatic training system according to an embodiment of the present invention will be described first with reference to the accompanying drawings. Referring to fig. 1, fig. 1 is a system for neural network interactive automatic training according to one embodiment of the present invention.

As shown in fig. 1, the system 10 comprises: a data storage component 100, a data annotation component 200, a model auto-training component 300, a model deployment component 400, and a user interaction component 500.

A data annotation component 200; the method comprises the steps of marking an original data set provided by a user, and generating a marking file of the original data set;

the model training component 300 is configured to automatically perform neural network training based on the annotation file and generate a trained model;

A model deployment component 400 component for deploying the trained model as an online reasoning service; the method is mainly used for deploying the trained model as an online reasoning service.

The user interaction component 500 is used to provide a method for a user to interact with other components of the system.

The data storage component 100 is configured to store the generated markup file and the trained model, and specifically, the data storage component 100 is mainly configured to store data assets such as a data set, a markup file, a neural network model, and the like.

According to this embodiment, the data annotation component 200 provides annotation services based on different interfaces for different tasks, and by exposing the data in the data storage component 100 to the interfaces and completing the annotation by the user, annotation files are generated in the data storage component 100, which can be directly used for automatic training by the model training component 300, without human handling. Model training component 300 automatically trains the neural network by reading the data sets and annotation files from data storage component 100 and can adaptively change policies based on the data set changes while preserving output assets such as model files to data storage component 100. The model deployment component 400 is operative to read the trained models stored in the data storage component 100 and deploy them as inference services. The whole automatic training system comprises a neural network development whole process from data labeling to model deployment, and has the characteristics of data updating in real time, strong feedback, expandability and the like. The user interaction component may be a set of Web-based user interfaces through which a user can interact with other components. For example, the data storage component can be accessed at the interface and file read-write operations can be performed; the method comprises the steps that a labeling application can be submitted to a data labeling component, and data labeling is completed at the interface; the training application can be submitted to the model training component, and the neural network automatic training is performed by using the marked data; the deployment application can be provided to a model deployment component that deploys the trained model as an online reasoning service.

In this embodiment, the user performs data annotation in the neural network interactive automatic training system, and the annotation result can be used for model training, and the training result can be deployed as an online service by one key, so as to realize a full-automatic process from original data to reasoning service.

Optionally, the data annotation component 200, model training component 300, model deployment component 400 in the system 10 can provide project management and version management. The project refers to algorithm development for a specific task, such as a flower identification task. The versions are for changes in the dataset, training, and deployment. Since in a specific task development, the data set is always updated along with the business, the algorithm is continuously optimized, the corresponding model is also continuously updated, and one project may be composed of multiple versions.

The embodiment provides a continuous data interaction method. The user can update the data to the system at any time without suspending the model training process, and the system automatically recognizes the change of the data and makes corresponding feedback, such as keeping the current algorithm to continue training, changing the algorithm, changing the strategy, etc.

Neural network interactive automatic training method embodiment

Referring to fig. 2, corresponding to the system embodiment of fig. 1, the neural network interactive automatic training method includes the following steps:

S101, calling a data labeling component to label an original data set provided by a user, generating a labeling file of the original data set, and storing the data labeling component in a data storage component;

Step S102, automatically performing neural network training by a model training component according to the annotation file, generating a trained model, and inputting the trained model to a model deployment component;

and step S103, calling a model deployment component to deploy the trained model into an online reasoning service.

< Method example 1>

In this embodiment, a method flow for data annotation is provided, which may be performed by the data annotation component 200 of the neural network interactive automatic training system.

As shown in fig. 2-1, the method for data annotation according to the present embodiment may include the following steps S201 to S203:

In step S201, the user stores the data set for training in the data storage component. The data storage component may be any file system, such as an object store. The data set storage needs to satisfy a certain rule, for example, the data set for one task is stored under a separate path and is not mixed with other files.

Step S202, a target data set catalog corresponding to a target task type is determined in response to the target task type selected by a user, labeling service based on an interface corresponding to the task type is provided for the target task type, and a data set in the target data set catalog is displayed;

It can be appreciated that the user selects the task type and the catalog of the data set, and the data set is marked by using different marking modes according to different task types, namely, the data set is marked by using different marking modes in the user interface. As an example, for an image classification task, all picture files in a user selected path may be presented at an interface and the pictures assigned to different labels by creating labels and checking.

Step S203, generating a labeling file of the data set corresponding to the target task type;

Specifically, the system gathers the labeling results into a labeling file according to certain. For example, for an image classification task, labeling information of all pictures can be stored in a json file, and a file path of each picture and a label thereof are stored in a key: value form.

< Method example 2>

In this embodiment, a method flow for model auto-training is provided, which may be performed by a model training component 300 of a neural network interactive auto-training system.

According to the method for model automatic training of the present embodiment shown in fig. 3, the following modules 310 to 360 may be included. The model automatic training method is used for automatically training a neural network model on a given task and a data set (including a labeling file). In addition to the task type, the data set catalog and the model preservation path, the method can also receive the information as follows in order to achieve the method of self-adaptive data updating and training process strong feedback.

Preferably, in order to improve the controllability of the training process, the input of the model automatic training method can further comprise optional model evaluation indexes, an upper limit of training duration and a lower limit of evaluation indexes to formulate an overall plan of the training process. The model evaluation index is used for explaining the effect of the model, and as an example, the common evaluation index of the image classification task comprises classification accuracy, precision, recall rate and the like, and the input index is used as the unique tracking index of the automatic training task and is the unique basis for algorithm selection and evaluation. The upper limit of the training time is used for controlling the maximum execution time of the training task, so that the algorithm is prevented from running without limitation, and the training task can be immediately ended no matter what state the training task is in after reaching the set upper limit of the training time. The lower limit value of the evaluation index is also used for limiting the execution process of the training task, and the algorithm does not pursue the maximization of the evaluation result endlessly, but stops after reaching the lower limit value.

Preferably, to improve flexibility, extensibility and mobility of the training task. The input may also receive a trained model derived from a previous training task as initialization information for a new training task. It will be appreciated that neural network training is based on a gradient descent algorithm, the training of which begins based on an initialized set of network parameter values. When the network architecture is the same or similar, the trained neural network model parameters can be used for globally/locally initializing a brand new neural network to be trained, so that the purposes of knowledge migration, training duration shortening and the like can be achieved. Alternatively, the migration method includes not only migration of the neural network weights, but also migration of the training schedule progress, which will be mentioned later in this embodiment.

According to the present disclosure and the optional and preferred methods described above, the methods used by the modules 310-360 according to this embodiment are as follows:

The overall planning module 310 is configured to receive planned training flow information, and send an instruction to other modules in the model training component based on the planned training flow information to execute a specific task;

It is appreciated that overall planning module 310 is the basic module of the overall training task for planning the training process and issuing instructions to other modules to perform the particular task. The overall planning module receives the following information: task type, data set path, evaluation index lower limit, training duration upper limit, and model path to be restored.

As an example, the overall planning module may take the flow work of steps S311-S316 as shown in fig. 3-1:

Step S311, call the data detection module, wait for the data detection module to return the data set information. The definition and detailed description of the data set information is described in detail in the embodiments of the data detection module 330. When the data detection module is called for the first time, if abnormal data information is returned, the abnormality is thrown out.

Step S312, a model scheduling module is called, and data set information, a model path to be recovered, an upper limit of training duration and current time are input. Definition and detailed description of the model scheduling module see the description of the model scheduling module 320 embodiment.

Step S313, detecting a variation of the data set in a certain manner. This step is performed as a relatively independent step after steps S311, S312. As an example, the data detection module may be invoked at intervals to detect data set information. When the data set detection is continuously carried out, if the data detection module returns abnormal data information, the information is not transmitted to the model scheduling module, otherwise, the information is transmitted to the model scheduling module.

Step S314, receiving a model training progress in a certain mode, and receiving current training information returned by the model scheduling module according to a preset time interval, wherein the current training comprises an evaluation index designated by a user, and the evaluation index is used for planning model training;

specifically, this step is performed as a relatively independent step after steps S311, S312. As an example, the information returned by the model scheduling module may be received at intervals. Specifically, it may be the model names, the evaluation results, and the model checkpoint paths of all the models currently. As an example, for an image classification task, the evaluation result may be one or several of accuracy, precision, recall, F1 value, etc. It should be noted that model training uses a unique evaluation index specified by the user to program training, but there may be multiple feedback indexes.

Step S315, training information is output according to a certain mode. This step is performed as a relatively independent step after steps S311, S312. As an example, the output recipient may be an interactive component 500 in the neural network interactive automatic training system 10 that provides an interactive interface for displaying feedback information of the current model training. As an example, the output method may be to refresh all currently trained/trained completion model names and evaluation results at intervals.

And S316, when the evaluation index value exceeds the lower limit or the training duration reaches the upper limit, calling a model storage module to store the model with the best evaluation result as an inference model which can be deployed as an inference service, and ending model training. The definition and detailed description of the model save module is incorporated into the model save module embodiment section.

The data detection module 320 is configured to detect data set information of an input data set directory, and perform preliminary processing on the data set. As an example, for an image classification task, the dataset information contains: all image file paths and their labels, all classes of labels, that are actually present in the data storage component. The data set information can be used to infer information such as the sample size of the data set and the number of classification categories, and also belongs to the data set information. For the image classification task in this embodiment, the dataset preliminary processing may be to parse and convert dataset information into tfrecord files to be saved under the dataset path. It should be noted that the data set information and data set processing are only part of the summary of preprocessing of the image classification task data, and the present invention does not restrict the specific task class nor the specific representation method and data processing manner of the data set information.

As an example, the data detection module may take the flow of steps S321-S325 as shown in fig. 3-2:

In step S321, data under the dataset directory is detected. As an example, for an image classification task, it may be detected whether a file named 'actions. Json' exists, and if not, an exception is thrown.

Step S322, reading the labeling file and analyzing the fields in the labeling file. As an example, for an image classification task, it may be specified that it contains 'annotations' and 'labels' fields, if one does not exist, the exception is thrown. 'labels' is a list containing all label strings, and 'section' is a list containing all 'file_path' and 'labels' image file path/label pairs.

In step S323, each image file path/label name pair (sample) is parsed. The image file path may be an absolute path of a predefined image file in the data storage component or a relative path with respect to the data set path. Taking the relative path as an example, it is spliced with the dataset path into an absolute path. When the absolute path of the file does not exist or the tag name does not exist in all tag name lists, the file is regarded as an abnormal sample, otherwise, the file is regarded as a normal sample. All normal samples were retained.

In step S324, all normal sample data are processed. As an example, when using Tensorflow deep learning framework, the raw data may be converted to tfrecord files. For example, for an image file path, the file is read and saved as a binary string; for the tag name, find its index in the list of all tag names, i.e., convert the image file path/tag name to an image binary string/category index. The converted data is then stored as tfrecord files.

Step S325, the data set information is returned to the module caller, the overall planning module. As an example, the dataset information may include tfrecord file paths, dataset sample size, tag list, category number, as per the steps described above.

The data loading module 330 is used for actually loading the data required by training from the data storage component and executing data preprocessing specific to a certain algorithm. The data preprocessing module relies on specific algorithms so that it is invoked only by the model training module. As an example, according to the example data generation information of the number detection module, the data loading module reads tfrecord the file, executes a specific data processing flow, and returns to a data iterator. It will be appreciated that the above method is merely exemplary of the method of loading and processing data in neural network training, and the invention is not limited to a particular implementation.

The model scheduling module 340 is an overall scheduling module for model training, and is used for planning a model training process. Specifically, the model scheduling module has a predefined default training flow for each task. As an example, for an image classification task, the model scheduling module predefines an algorithm pool containing a series of preset algorithms, the algorithm pool comprising: reference algorithm, super-parametric tuning algorithm, architecture search algorithm, and integration algorithm. The reference algorithm comprises classical neural network algorithms with fixed super-parameters and fixed architecture, such as ResNet, mobileNet; the super-parameter tuning algorithm is a network architecture based on a reference algorithm, but the super-parameter is a variable algorithm. The super-parameters comprise a learning rate plan, an iteration period, a batch size, an optimizer and the like; the architecture search algorithm takes a part of network structure (also called a module) of a reference algorithm as an architecture search space, and trains the algorithm with the super parameters unchanged. For example: for the reference algorithm ResNet, part of the network structure may be bottleneck module, etc. The architecture search algorithm will stack these modules in a branched or tower structure to form a complete neural network; an integration algorithm is an algorithm that integrates any number of trained models into one monolithic model, such as: and averaging the unactivated output values of the output layer of the neural network to be integrated, and activating the output values to be used as the output of the integrated model.

The model scheduling module 340 is only used to plan training plans for models and is not responsible for actual model training. When a model is specifically trained, the model scheduling module calls the model training module, and information (algorithm name and data set information) required by training is transmitted. As an example, for an image classification task, the model scheduling module may take the flow work of steps S3401-S3411 as shown in fig. 3-3:

And S3401, receiving data set information, a model path to be restored, an upper limit of training duration and current time spent which are transmitted by the overall planning module.

Step S3402, storing the trained model information by initializing a model pool; the model information comprises an algorithm name of the model, an algorithm type, an evaluation result of the model, a model file path of the model, whether the model is trained or not, and a label name list corresponding to the model output;

judging the state of the model information:

step S3403, if the model information to be restored exists, the model information is placed in the model pool.

In step S3404, if there is no model to be restored, a reference algorithm training sequence is determined according to FLOPs of the reference algorithm, and FLOPs is lower in algorithm priority. For example, mobileNet has FLOPs smaller than ResNet, its training rank is in front of ResNet. To fully utilize the time budget, the model scheduling module dynamically allocates the number of algorithms trained in parallel based on the available GPUs and memory space, but generally some algorithms still lower FLOPs are preceded until all reference algorithm training in the algorithm pool is completed.

And S3405, if the model to be recovered exists, determining a reference algorithm training sequence according to the trained model to be recovered. When the progress of the model to be restored is only the reference model, the evaluation results of all the trained reference models on the previous version data sets are ordered. Specifically, if the evaluation index is a high-priority index, the algorithm corresponding to the result with the higher value is prioritized, and if the evaluation index is a low-priority index, the algorithm corresponding to the result with the lower value is prioritized. For example, in image classification, an algorithm with a higher classification accuracy is preceded by a classification accuracy as an evaluation index, and a lower algorithm is preceded by a classification cross entropy loss as an index. When the super-parametric-tuning model or the architecture search model exists in the model to be recovered, the reference model, the super-parametric-tuning model corresponding to the reference model and the corresponding reference model with the highest evaluation result in the architecture search model corresponding to the reference model are used as a reference algorithm training sequence. For example, the model to be recovered includes a benchmark ResNet, a super-parameter tuning ResNet, a framework search ResNet and a benchmark MobileNet, the evaluation index is the classification accuracy, the evaluation result is ordered as a framework search ResNet > benchmark MobileNet > super-parameter tuning ResNet > benchmark ResNet, and the benchmark algorithm training sequence is ResNet- > MobileNet. The remaining reference algorithms are still ranked from low to high by FLOPs. Unlike the case where there is no model to be recovered, when the model training module is invoked for training, in addition to the name of the reference algorithm to be trained, if there is a model to be recovered for the reference algorithm (whether or not the model is trained completely), the model path to be recovered is additionally imported. As described above, the reference algorithms are trained sequentially/in parallel until all reference algorithm training is completed.

In step S3406, after all the reference algorithms are trained, an algorithm with the optimal evaluation result is selected. Based on the reference algorithm, the super-parameter tuning algorithm/architecture searching algorithm starts to be trained. The architecture search algorithm often requires a lot of training time, and in this embodiment, whether to select the super-parameter optimization algorithm or the architecture search algorithm may be determined according to the remaining training time. For example, if the remaining time is less than 2 hours, the super-parametric tuning algorithm is trained, otherwise the architecture search algorithm is trained. Training is continued until the super-parametric search algorithm traverses through a predefined super-parametric search space (if the space is limited), or the architecture search algorithm traverses through a predefined architecture search space (if the space is limited).

Step S3407, if at least two models which are already trained and have the same label list as the label list of the current data set information exist in the model pool during the whole model training period, and there are margin resources, training the integrated model by using the resources.

In step S3408, at the beginning of each model training, algorithm information and data set information are transmitted to the model training module. In this embodiment, the algorithm information includes an algorithm name, a model path to be restored, and the dataset information includes tfrecord file paths, sample sizes, a tag name, and a category number. During each model training period, the model training module returns the training condition of the current model in a certain manner, for example, returns the evaluation result of the current model and the model check point save path at intervals, and returns a model untrained finish signal. And after training, returning a final model evaluation result and a model preservation path, and simultaneously returning a model training completion signal. For a detailed description of model training, see model training module embodiments section.

Step S3409, if the data set information change signal transmitted by the overall planning module is received during the whole model training period, stopping all the currently running model training modules. The current training progress is initialized, training is conducted according to the sequence of the reference algorithm again, and if a model pool has an existing homonymous algorithm, a model check point path to be recovered of the algorithm (whether the model is trained or not) is additionally transmitted to a model training module.

In step S3410, when an anomaly thrown by any model training module is received, only the anomaly is printed, and training of the model is skipped.

Step S3411, returning the training progress to the overall planning module according to a certain method. For example, the model pools are synchronized to the overall planning module at intervals.

Model training module 350 is a specific module that is trained using an algorithm, which may be a baseline algorithm, a super-parametric optimization algorithm, an architecture search algorithm, or an integrated algorithm. When the module receives the instruction of the model scheduling module, a specific neural network is constructed according to the algorithm information and the data set information which are transmitted by the module, and the data loading module is called to load and preprocess the data set for training. As an example, for an image classification task, the model training module may take the flow work of steps S351-S356 as shown in fig. 3-4:

Step S351, receiving algorithm information of a specified algorithm and the data set information; the algorithm information comprises an algorithm name, an algorithm type and a model path to be restored; the dataset information includes tfrecord file paths, sample volumes, signature names, category numbers.

And step S352, constructing a neural network according to the algorithm name, the algorithm type and the class number, initializing the parameters of the neural network and initializing an optimizer.

Step S353, if there is a model path to be restored, loading and analyzing the neural network structure, parameters and optimizer state of the model. And recovering the parameters of the neural network as much as possible according to the structure of the neural network. For example, when the user changes the data set to change the image classification category number, the information is transferred to the model training module along the data detection module, the overall planning module and the model calling module, and at this time, the parameter dimension of the last fully connected layer of the neural network to be restored of the same-name reference algorithm is not consistent with the current neural network, and the parameters of all layers except the last layer of the neural network to be restored are assigned to the parameters of the corresponding layer of the current neural network.

Step S354, call the data loading module, read tfrecord the file, and construct the data pipe suitable for it according to the current specific algorithm.

Step S355, training according to different neural networks according to the preset methods. Specifically, different types of algorithms may take the following example settings:

Reference algorithm: the method comprises the steps of training for 100 periods by default (traversing 100 training sets), dynamically determining the batch size according to resource conditions (memory and video memory), and setting the iteration times of each period as the sample size/batch size. Adam optimizer is used by default and the learning rate is attenuated to one tenth of the original at the 60 th and 90 th epochs. And after each period is finished, evaluating the verification set, and storing a model with the optimal evaluation result of the verification set. Using the early termination strategy, training is ended when the validation set evaluation results are not elevated for consecutive periods.

Super-parameter tuning algorithm: the training performed by each group of super-ginseng is called a test. Each trial training set was consistent with the baseline algorithm described above. The super-ginseng is sampled from a preset super-ginseng search space according to bayes algorithm, and the default sampling is performed for 50 times. By default, hyperband algorithm is used for controlling the test progress, and bad tests can be stopped in time according to the intermediate results of the tests so as to speed up the search.

Architecture search algorithm: training performed by each set of architecture module combinations is referred to as a trial. Each trial training set was consistent with the baseline algorithm described above. The architecture combination samples and validates in a certain way, default sampling 50 times. The hyperband algorithm was used to control the progress of the test as in the super-parametric tuning algorithm described above.

Integration algorithm: and weighting the unactivated values of the last layer of all the models participating in the integration, wherein the weight is a learnable parameter, and outputting the weighted result as a model after activating. And training the integrated model, wherein the training setting is consistent with the reference algorithm.

Step S356, each period of training returns training information of the current specified algorithm to the model scheduling module, including: algorithm name, algorithm category, validation set evaluation results, model checkpoint path, whether the model is trained complete (last period or early termination is complete, the rest is incomplete), and list of signature in the dataset information.

The present embodiment provides a controllable training process. The user may set the overall requirements of the training process, such as setting a time threshold and a performance lower limit requirement. The automatic learning algorithm uses a serial-parallel combination mode of classical network, super-parameter tuning, architecture searching and integrated learning, and the training process can be interrupted and restored at any time without affecting the training progress. The training process has version management and history records, and a new version can be restored based on any previous version, so that knowledge migration and expansion to any training duration can be easily completed. The training process has real-time meaningful feedback, and a user can observe what strategy has influence on the algorithm effect, and how much difference the current algorithm performance is from the target expectation.

The model saving module 360 calls the model saving module to transfer the current model pool, the data set information and the model saving path designated by the user when the overall planning module judges that the training meets the stop condition. The model saving module finds a model with the optimal current evaluation index from the model pool, the corresponding label list of the model output is consistent with the label list of the current data set information, and the model is saved as an inference model which can be directly used for reasoning. For example, when the model file is directly available for reasoning, the model file is directly copied to the user model save path; when additional processing is needed, the model can be loaded first, and then stored into a user model storage path after corresponding processing.

It should be noted that, in this embodiment, the "data set information" has different meanings between different modules and interactions thereof, and the following supplementary explanation is given below:

For the data detection module. The extracted data set information contains information describing the data set case (which varies depending on the task type) such as an original data file path (when the data detection module does not generate additional data file output) or a preprocessed data file path (when the data detection module generates additional output), etc. This information is passed to the overall planning module.

For the overall planning module, the id of the current dataset is generated from the current dataset information and the historical dataset information (if any). The "data set id" is a flag for defining whether or not data is changed in quality. For example, for an image classification task, when the overall label is unchanged, the increase/decrease in the number of samples does not affect the nature of the task, is the amount; when a tag changes, it means that the nature of the task changes, and is qualitative. When the data is changed in quantity, the model training progress is not affected, and when the data is changed in quality, the information is transmitted into the model scheduling module to re-program the training progress like a whole programming module. When the overall planning module transmits information like the model scheduling module, the overall planning module can also comprise a data set id and a data set id change signal besides the data set information received from the data detection module.

For the model scheduling module, the "data set information" sent to the model training module is the same as the "data set information" sent to the model scheduling module by the overall planning module.

For the model training module, the "data set information" returned to the model scheduling module refers to "data set id".

According to the method of the present embodiment, updating the data set and the data annotation during training can be achieved through the cooperation of the overall planning module 310, the data detection module 320 and the model scheduling module 340, and the model training can detect and adapt to such changes. The model scheduling module 340 and the model training module 350 are matched, so that the neural network automatic training process with the training progress and the resource can be fully utilized. When the data changes, the progress can be dynamically adjusted, and trained model information is utilized to the maximum extent, so that the effect and the efficiency of automatic learning of the neural network are improved.

< Method example 3>

In this embodiment, a method for performing neural network model deployment is provided, which may be performed by a model deployment component 400 of a neural network interactive automatic training system.

As shown in fig. 4, the method for performing neural network model deployment of the present embodiment may include the following steps S401 to S406:

In step S401, a neural network model (trained model) generated by automatic training of the model is acquired. At the time of first deployment, the model file path is specified by the outside. For example, the user interaction component 500 of the neural network interactive automatic training system according to the present invention is specified by the user. In particular, the model exists in a single file or a series of files related to the model reasoning. The present invention does not limit the file format and number of models.

Step S402, judging whether the neural network model (trained model) is acquired for the first time, if so, executing step S404, otherwise, executing step S403.

Step S403, judging whether the neural network model is updated, if not, executing step S402 to continuously detect the model updating condition, otherwise, executing step S404.

Step S404, a neural network model is loaded by using a neural network reasoning engine and deployed as a reasoning service. In particular, the inference service may provide access in any manner, such as with a REST API interface. The invention is not limited to the neural network reasoning engine used, which may be Tensorflow Serving, MLflow, or the like, for example.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A neural network interactive automatic training system, comprising a data storage component, a data labeling component, a model training component, and a model deployment component:

The data storage component is used for storing the generated annotation file and the trained model;

the data storage component comprises a data set catalog, wherein the data set catalog in the data storage component is used for storing a data set appointed by a user in advance, and the data set catalog has a corresponding relation with a task type;

The data annotation component is further used for responding to a target task type selected by a user, determining a target data set catalog corresponding to the target task type, providing annotation service based on an interface corresponding to the task type for the target task type, and displaying a data set in the target data set catalog; generating a labeling file of a data set corresponding to the target task type, and storing the labeling file into the data storage component;

The model training assembly comprises an overall planning module, a data detection module, a data loading module, a model training module and a model storage module;

The data detection module is used for detecting data set information in a data set catalog and carrying out preliminary processing on the data set; the data set information extracted by the data detection module comprises an original data file path or a preprocessed data file path, and is used for representing information of data set conditions, the information of the data set conditions is different according to task types, and the information of the data set conditions is transmitted to the overall planning module by the data detection module;

The overall planning module is used for receiving planning training flow information, and sending instructions to other modules in the model training assembly based on the planning training flow information so as to execute specific tasks; the planning training flow information comprises at least one of task types, data set paths, evaluation indexes, evaluation index lower limits, training duration upper limits and model paths to be recovered; the overall planning module generates an id of a current data set according to the current data set information and/or the historical data set information, wherein the data set id is used for defining a mark whether the data is changed in quality or not; the data loading module is used for loading data required by training from the data set information and executing preprocessing of data related to a specified algorithm to obtain a preprocessed data set;

The model scheduling module is used for planning model training processes, wherein each task type has a corresponding default training process; the model scheduling module sends the same data set information to the model training module as the data set information sent to the model scheduling module by the overall planning module; the model training module is used for constructing a neural network according to the algorithm information and the data set information of the specified algorithm, loading and training the preprocessed data set, wherein the data set information returned by the model training module to the model scheduling module is a data set id;

The overall planning module is used for receiving planning training flow information, and sending instructions to other modules in the model training assembly based on the planning training flow information to execute specific tasks, and specifically comprises the following steps:

invoking the data detection module to wait for the data detection module to feed back data set information;

the data set information, the model path to be recovered, the training time upper limit and the current time spent are transmitted through the model scheduling module;

The data detection module is called according to a preset time interval to detect data set information, wherein when the data set detection is continuously carried out, if the data information returned by the data detection module is abnormal, the information is not transmitted to the model scheduling module, otherwise, the information is transmitted to the model scheduling module;

Receiving a model training progress in a certain mode, and receiving current training information returned by the model scheduling module according to a preset time interval, wherein the current training comprises an evaluation index designated by a user, and the evaluation index is used for planning model training;

when the value of the evaluation index exceeds the lower limit or the training time length reaches the upper limit, a model storage module is called to store a model with the best evaluation result as an inference model which can be deployed as an inference service, and model training is finished;

Correspondingly, the model scheduling module starts the model training module, constructs a neural network according to the algorithm information and the data set information of the specified algorithm, and loads and trains the preprocessing data set, and specifically comprises the following steps:

Receiving algorithm information and data set information of a specified algorithm; the algorithm information comprises an algorithm name, an algorithm type and a model path to be restored;

constructing a neural network according to the algorithm name and the algorithm type, initializing parameters of the neural network, and initializing an optimizer;

if a model path to be restored exists, loading and analyzing the neural network structure, parameters and optimizer state of the model;

Calling a data loading module, reading a data file, and constructing a data pipeline suitable for the data file according to a current specific algorithm;

training according to different neural networks according to preset methods;

Wherein each period of training returns training information for the specified algorithm to the model scheduling module, the training information comprising: algorithm name, algorithm category, validation set evaluation result, model checkpoint path, whether model training is complete, and dataset information;

the overall planning module calls the step of planning a model training flow by the model scheduling module, and the method specifically comprises the following steps:

receiving data set information, a model path to be restored, an upper limit of training duration and current spending time which are transmitted by the overall planning module;

storing the trained model information by initializing a model pool; the model information comprises an algorithm name, an algorithm type, an evaluation result of the model, a model file path of the model, whether the model is trained, and a data set id corresponding to the model output;

judging the state of the model information:

if the model information to be restored exists, the model information to be restored is placed in the model pool;

if the model to be restored does not exist, determining a reference algorithm training sequence according to FLOPs of the reference algorithm;

If the model to be recovered exists, determining the training sequence of the reference algorithm according to the trained model to be recovered; when the progress of the model to be restored is only a reference model, sequencing according to the evaluation results of all the trained reference models on the previous version data set; when the super-parameter tuning model or the architecture searching model exists in the model to be recovered, taking the reference model, the super-parameter tuning model corresponding to the reference model and the corresponding reference model with the highest evaluation result in the architecture searching model corresponding to the reference model as a reference algorithm training sequence; training the reference algorithms in sequence until all the reference algorithms are trained;

After the training of all the reference algorithms is completed, selecting the reference algorithm with the optimal evaluation result, and starting to train the super-parametric tuning algorithm/architecture searching algorithm based on the reference algorithm with the optimal evaluation result;

during the whole model training period, if at least two models which are already trained and have the same data set id in the model information as the id of the current data set exist in the model pool and redundant resources exist, training an integrated model by using the redundant resources;

At the beginning of each model training, algorithm information and data set information are transmitted to the model training module; after model training is finished, feeding back a final model evaluation result and a model storage path, and feeding back a model training completion signal;

During the whole model training period, if a data set information change signal transmitted by the whole planning module is received, stopping all the currently running model training modules; initializing the current training progress, training again according to the sequence of the reference algorithm, and if the model pool has the same-name algorithm, additionally transmitting a model check point path to be restored of the algorithm to the model training module;

Printing the abnormality when the abnormality thrown by any model training module is received, and skipping the training of the model;

and feeding back the model training progress to the overall planning module.

2. The system of claim 1, the model deployment component for retrieving a trained model from the data storage component; when the trained model is acquired for the first time and the trained model does not need to be updated, loading the trained model by using a neural network reasoning engine, and deploying the trained model as a reasoning service.

3. A neural network interactive automatic training method, characterized in that the method comprises the following steps:

invoking a model deployment component to deploy the trained model as an online reasoning service;

The model training assembly comprises an overall planning module, a data detection module, a data loading module, a model scheduling module, a model training module and a model saving module;

The model training component automatically carries out neural network training according to the annotation file, and generates a trained model, which specifically comprises the following steps:

Invoking the data detection module to detect data set information in a data set catalog, and performing primary processing on the data set; the data set information extracted by the data detection module comprises an original data file path or a preprocessed data file path, and is used for representing information of data set conditions, the information of the data set conditions is different according to task types, and the information of the data set conditions is transmitted to the overall planning module by the data detection module;

Receiving planning training flow information by the overall planning module, and sending instructions to other modules in the model training assembly based on the planning training flow information so as to execute specific tasks; the planning training flow information comprises at least one of task types, data set paths, evaluation indexes, evaluation index lower limits, training duration upper limits and model paths to be recovered; the overall planning module generates an id of a current data set according to the current data set information and/or the historical data set information, wherein the data set id is used for defining a mark whether the data is changed in quality or not; loading data required by training from the data set information through the data loading module and executing preprocessing of data related to a specified algorithm to obtain a preprocessed data set;

invoking the model scheduling module to plan a model training process, wherein each task type has a corresponding default training process; the data set information sent by the model scheduling module to the model training module is the same as the data set information sent by the overall planning module to the model scheduling module;

starting the model training module, constructing a neural network according to algorithm information and data set information of the specified algorithm, and loading and training the preprocessed data set, wherein the data set information returned by the model training module to the model scheduling module is a data set id;

training according to different neural networks according to preset methods;

judging the state of the model information:

and feeding back the model training progress to the overall planning module.

4. The automatic training method of claim 3, wherein the step of annotating the original data set provided by the user to generate an annotation file of the original data set specifically comprises:

Responding to a target task type selected by a user, and determining a target data set catalog corresponding to the target task type;

Providing annotation service based on an interface corresponding to the task type aiming at the target task type, and displaying a data set in the target data set catalog;

And generating a labeling file of the data set corresponding to the target task type.

5. The automatic training method of claim 3, wherein the step of invoking a model deployment component to deploy the trained model as an online reasoning service comprises:

obtaining a trained model from the data storage component;

when the trained model is acquired for the first time and the trained model does not need to be updated, loading the trained model by using a neural network reasoning engine, and deploying the trained model as a reasoning service.