CN114707667A

CN114707667A - Data-driven automatic model training and application system

Info

Publication number: CN114707667A
Application number: CN202210475293.3A
Authority: CN
Inventors: 王羽; 葛唯益; 王菁; 荀智德; 刘亚军; 陆辰
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-05

Abstract

The invention discloses a data-driven automatic model training and application system, which expands the labeled data quantity by carrying out data transformation in various ways on heterogeneous labeled data such as text and image data; designing a reasonable calculation network and a super parameter for the enhanced labeled data set based on a neural network architecture search technology; and finally, carrying out model distillation according to the software and hardware conditions and the data characteristics of the deployment end, and issuing to the service end for deployment. The system solves the practical problems that model training data in the deep learning field are difficult to prepare, parameters are difficult to adjust and the requirement of a trained model on hardware resources is high, and the like, and realizes the capability of automatic model training and model publishing service application through a small amount of standard labeled data.

Description

Data-driven automatic model training and application system

Technical Field

The invention relates to the technical field of machine learning, in particular to a data-driven automatic model training and application system.

Background

The computer learning artificial intelligence technology represented by deep learning is characterized in that the computing power, the computing data (labeled data) and the algorithm are three major core elements of artificial intelligence from concept to landing application.

The development of computing power is related to the hardware level development of the science and technology field, and various hardware devices are rapidly iterated at present and can gradually meet the requirements of deep learning, training and prediction.

In the aspect of data calculation, under the big data era, massive raw data provides batch unmarked processed data for deep learning, but due to the limitation of marking capability, the raw data is difficult to be quickly and effectively utilized, and the scale and quality of marked data in the existing vertical field cannot meet the generation of a high-quality intelligent model, so that the problem gradually becomes a bottleneck problem of intelligent system development. How to form an automatic labeling model according to a small amount of labeled samples, how to use the automatic labeling data to iteratively train the model again, and how to support the generation of an intelligent conspiracy business model under the shortage of military samples are difficult problems.

In the aspect of algorithms, the development threshold of various intelligent algorithms is high, and the algorithms are difficult to use by common application system developers. Firstly, the selection of the algorithm framework greatly depends on the experience of developers, the existing intelligent frameworks (Spark, TensorFlow, Pythrch and the like) are respectively long, and a unified framework scheduling mechanism and an algorithm construction environment are lacked. Secondly, in the field of intelligence, a plurality of open-source software packages such as machine learning and data mining software weka and a machine learning toolkit sklern provide various complex learning algorithms and feature selection methods, so that although the threshold of the intelligent method is reduced to a certain extent, the difficulty of selecting the algorithm packages is brought to developers. Finally, when the hyper-parameters are set as default values, the optimal algorithm effect is difficult to obtain, the basis that the hyper-parameters lack dominance is reasonably set for related algorithms, and experience and multiple rounds of trial and error are mainly used. How to construct a data set, select a set of algorithm, set a series of super parameters, and automatically develop learning training to obtain a model with good performance becomes the actual requirement for falling to the ground of the intelligent technology.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a data-driven automatic model training and application system aiming at the defects of the prior art.

The technical scheme is as follows: in order to solve the technical problem, the invention discloses a data-driven automatic model training and application system, which comprises a data enhancement layer, a model automatic learning training layer, a model distillation layer and a service distribution layer, wherein the data enhancement layer is connected with the model automatic learning training layer through a network;

the data enhancement layer is used for accessing the marking data and enhancing the data of the marking data to obtain enhanced marking data;

the model automatic learning training layer is used for training the model of the enhanced labeled data and obtaining the trained model after parameter search in the model;

the model distillation layer is used for carrying out compression distillation on the trained model to obtain a model after the compression distillation;

and the service distribution layer is used for packaging the trained model and/or the compressed and distilled model into intelligent service and managing the distributed online model service.

Further, the annotation data in the data enhancement layer comprises heterogeneous data of different types: text data, image data, and formatted data; the text data enhancement comprises configuring a data enhancement strategy aiming at the marked text, carrying out deformation conversion based on the data enhancement strategy and automatically generating a marked corpus;

the image data enhancement comprises image space transformation and image pixel transformation, and automatically generates image annotation data;

the formatted data enhancement comprises random replacement and dictionary replacement of different dimensional data, and automatic generation of marking data. The data enhancement mode saves the workload of manual marking.

Further, the data enhancement strategy for the annotation text comprises entity replacement, synonym replacement and translation of the text,

the entity replacement of the text comprises the steps of identifying military entities in the text through entities, and finding similar words through a knowledge graph for replacement; the synonym replacement comprises the step of replacing synonyms in the text according to a synonym table; the retranslation comprises the steps of translating the Chinese into English through a translation engine and then translating the English into the Chinese;

the image spatial transformation includes rotation, flipping and cropping of the image, and the image pixel transformation includes noise and sharpening.

Further, the model automatic learning training layer combines historical model training effect information and experience knowledge of algorithm research and development personnel to form a mapping network among frames used for realizing the business scene, the model training algorithm and the model training algorithm, the mapping network can associate the business scene with the model training algorithm, and different model training algorithm candidate sets are found for different business scenes, wherein the same model training algorithm is realized by different frames; when a user selects a task requirement and label data, automatically selecting a matched model training algorithm and an implementation frame according to a mapping network, wherein the matched model training algorithm comprises more than two deep learning models, and one deep learning model corresponds to one implementation frame; and automatically selecting and optimizing model network parameters through the matched model training algorithm to obtain a trained model, so that the time cost for optimizing a large number of algorithm developers is reduced.

Further, the automatic model network parameter selection and tuning are performed through a matched model training algorithm, and obtaining the trained model includes:

inputting the enhanced labeling data and the hyper-parameters and the network structure parameters of the more than two deep learning models into an automatic model learning training layer, and generating a candidate set of a network structure to be searched, namely a search space, through an NAS (Neural network Architecture search) algorithm;

model network parameter searching is carried out in a searching space based on a searching strategy, wherein the searching strategy comprises methods of exhaustion, searching based on a continuous space and a discrete space and the like;

and evaluating the model network parameter result through model training to obtain a network performance evaluation index and a trained model.

Further, the searching model network parameters in the search space based on the search strategy comprises:

embedding the network structure into a continuous space, wherein each point in the continuous space corresponds to one network structure and can define a prediction function of accuracy;

performing gradient-based optimization by taking the prediction function as a target function to find out the embedded representation of a better network structure; and after the optimization is completed, mapping the embedded representation back to the network structure to obtain a parameter recommendation value, and finding out a suboptimal solution meeting the requirement in a search space.

The search strategies more specifically comprise grid search, random search, genetic algorithm, Bayesian optimization, evolutionary method, particle swarm optimization, reinforcement learning and gradient-based algorithm, wherein the reinforcement learning and the genetic algorithm are still searched in a discrete space essentially, and the objective functions are regarded as black boxes. If the search space is continuous and the objective function can be micro, the search can be more effectively carried out based on the gradient information, so that the method and the device can carry out the search in a gradient-based mode so as to meet the requirement of suboptimal solution.

Further, the model network parameter result is evaluated through model training, and the network performance evaluation index obtained in the network performance evaluation index includes accuracy, recall ratio, F1 value, ROC curve (Receiver Operating Characteristic curve) and LOSS curve: the evaluation process comprises the steps of carrying out model training according to the searched suboptimal solution and obtaining a corresponding evaluation index; and continuously iterating the searching process and the evaluating process until a solution meeting the constraint is found, thereby obtaining the trained model.

Further, the model distillation layer takes the trained model as a teacher model, more than two student models are constructed by using a neural network with the scale of less than 10% of the teacher model, parameter knowledge contained in the teacher model is transferred to the student models through knowledge distillation, and finally the student models are integrated through model integration to obtain the model after compression distillation. The model distillation layer can control the size of the model and improve the operation efficiency, meanwhile, the accuracy of the model is ensured, the model which has better service performance but large parameter scale and low loading speed can be compressed and distilled, and the efficiency of the model is improved.

Furthermore, the service distribution layer can automatically select a trained model or a compressed and distilled model according to a user request and a performance index requirement on the service and package the trained model or the compressed and distilled model into an intelligent service, so that the customized capacity requirement is met.

Aiming at the requirement of a user on efficiency, performing service encapsulation on the trained model or the compressed and distilled model to ensure that the service quality of the system meets the customization requirement of the user; the managing the published online model service comprises:

determining the use authority according to the user request, starting a corresponding service module, receiving the request data of the user, executing the service request of the user and returning the service result of the system;

the multi-task concurrency and scheduling are carried out on the multi-user high-dynamic user service requests, the computing resources can be reasonably distributed and scheduled according to the user grades, the service types, the task emergency degree and the resource allowance condition difference, and the utilization rate of the computing resources is improved;

monitoring a model training process, and visually displaying indexes such as training progress, loss functions, resource occupation and the like, a user request quantity QPS, system resource allowance, user response delay and system fault frequency indexes in a chart mode;

the customized requirements of the users on the services are supported, and the model services can be exported for independent deployment and use.

Further, after automatically selecting the trained model or the model after the compression distillation and packaging the model into the intelligent service, the service calling personnel executes the service application task or carries out secondary development by combining a service system, and the method comprises the following steps:

the service transfer personnel performs model screening according to business requirements based on the registered model;

calling a dependency mirror image to create a container, and packaging the model into REST service;

according to the model requirements, providing environment support and resources required by intelligent service operation;

and the service call personnel use the uniform interface to execute the service application task or combine the service system to carry out secondary development.

Has the advantages that:

the invention trains out a high-quality deep learning model through related technologies such as enhancement of labeled data, automatic modeling and parameter tuning technology, model distillation and service release and the like through limited labeled data and rapidly releases the model into service for application and calling as required, thereby realizing data-driven automatic model training and application, and compared with the prior art, the invention has the remarkable advantages that: 1) the workload of data annotation is reduced, and the complex and repeated data annotation work is reduced. 2) The technical thresholds of algorithm selection and algorithm parameter tuning are reduced, and the model can be trained quickly and iteratively. 3) On the premise of keeping the accuracy of the model, the resource space and the running time required by the running of the model are reduced as much as possible, and the efficiency of the model is improved.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a general architecture diagram of a data-driven automated model training and application system according to the present invention.

FIG. 2 is a flow chart of the automatic training environment construction of the data-driven automatic model training and application system of the present invention.

FIG. 3 is a flow chart of a model distillation and service invocation application of a data-driven automated model training and application system of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings.

The data-driven automatic model training and application system provided by this embodiment can be applied to a business scenario where data update iteration is fast, the requirement on data processing real-time performance is high, and an algorithm research and development worker is lacking, for example, social public opinion analysts, social network information update propagation speed is fast, and it is urgently needed to realize fast analysis of real-time information and assist in management and control through the system.

The data-driven automatic model training and application system provided by the embodiment of the application is shown in fig. 1 and comprises a data enhancement layer, a model automatic learning training layer, a model distillation layer and a service distribution layer;

In this embodiment, the annotation data in the data enhancement layer includes heterogeneous data of different types: text data, image data, and formatted data;

the text data enhancement comprises configuring a data enhancement strategy aiming at the marked text, carrying out deformation conversion based on the data enhancement strategy and automatically generating a marked corpus; the data enhancement strategy for the labeled text comprises entity replacement, synonym replacement and retranslation of the text, wherein the entity replacement of the text comprises identifying military entities in the text through entities, and finding similar words through a knowledge graph for replacement; the synonym replacement comprises the step of replacing synonyms in the text according to a synonym table; the retranslation comprises the steps of translating the Chinese into English through a translation engine and then translating the English into the Chinese;

the image data enhancement comprises image space transformation and image pixel transformation, and automatically generates image annotation data; the image spatial transformation includes rotation, flipping and cropping of the image, and the image pixel transformation includes noise and sharpening.

The formatted data enhancement comprises random replacement and dictionary replacement of different dimensional data, and automatic generation of marking data.

In this embodiment, the model automatic learning training layer combines historical model training effect information and experience knowledge of algorithm developers to form a mapping network among frames used for implementing a service scene, a model training algorithm, and the model training algorithm, where the mapping network can associate the service scene with the model training algorithm, and find different model training algorithm candidate sets for different service scenes, where the same model training algorithm is implemented by different frames, for example, the model training algorithm is a Long Short-Term Memory network LSTM (Long Short-Term Memory), some are implemented by a TensorFlow frame, and some are implemented by a Python machine learning library pych; when a user selects a task requirement and label data, automatically selecting a matched model training algorithm and an implementation frame according to a mapping network, wherein the matched model training algorithm comprises more than two deep learning models, and one deep learning model corresponds to one implementation frame; and automatically selecting and optimizing model network parameters through a matched model training algorithm to obtain a trained model.

The automatic model network parameter selection and tuning are performed through the matched model training algorithm, and the obtaining of the trained model comprises the following steps:

inputting the enhanced labeling data and the hyper-parameters and the network structure parameters of the more than two deep learning models into an automatic model learning training layer, and generating a candidate set of a network structure to be searched, namely a search space, through an NAS algorithm;

model network parameter searching is carried out in a searching space based on a searching strategy;

The searching model network parameter in the search space based on the search strategy comprises the following steps:

optimizing based on gradient by taking the prediction function as a target function to find out the embedded representation of a better network structure; and after the optimization is completed, mapping the embedded representation back to the network structure to obtain a parameter recommendation value, and finding out a suboptimal solution meeting the requirement in a search space.

And evaluating the model network parameter result through model training to obtain network performance evaluation indexes including accuracy, recall rate, F1 value, ROC curve and LOSS curve: the evaluation process comprises the steps of carrying out model training according to the searched suboptimal solution and obtaining a corresponding evaluation index; and continuously iterating the searching process and the evaluating process until a solution meeting the constraint is found, so as to obtain the trained model.

In this embodiment, the model distillation layer uses the trained model as a teacher model, uses a neural network with a scale of less than 10% of that of the teacher model to construct more than two student models, transfers parameter knowledge contained in the teacher model to the student models through knowledge distillation, and finally integrates the student models through model integration to obtain the model after compression distillation.

In this embodiment, the service distribution layer can automatically select a trained model or a compressed and distilled model according to a user request and a performance index requirement for the service, and package the trained model or the compressed and distilled model into an intelligent service; the managing the published online model service comprises:

the multi-task concurrency and scheduling are carried out on the multi-user high-dynamic user service request, and the computing resources can be reasonably distributed and scheduled according to the user grade, the service type, the task emergency degree and the resource allowance condition difference;

In this embodiment, the data-driven automatic model training and application system performs the following steps.

Step 1, accessing, leading and processing a label data set, configuring data enhancement strategy parameters according to different types of data, realizing the enhancement capability of texts, images and formatted data, and improving the data volume while ensuring that the data distribution is not changed.

Step 2, automatically selecting and confirming a deep learning model algorithm according to business requirements, and searching and confirming parameters of the deep learning model algorithm, wherein the specific steps are as follows;

step 2.1, selecting a model training algorithm

The algorithm is complicated and complicated aiming at different tasks. Only by taking classification tasks as an example, dozens of classification algorithms are commonly used at present, and in addition, various algorithms are designed according to various specific problems, hundreds of classification algorithms can be selected when one classification task is faced. And constructing a mapping network by using the service scene, the model and the used framework supported by the platform, and associating the service with the algorithm. Aiming at different application services, different algorithms are found, the algorithms are realized by different frames, when a user selects a task requirement and labels data, a model algorithm is automatically selected, a new container is created to create an innovative training environment, and the specific process is shown in fig. 2. Specifically, in the task of image target identification, different model algorithms are provided for target types, such as identification of aircraft models, ship models, vehicle types, and the like, candidate algorithms include a target detection algorithm YOLO v3, a Fast regional Convolutional Neural Network Fast R-CNN, a regional Convolutional Neural Network R-CNN (Region-Convolutional Neural Network), and the like, and correlation screening of the algorithm models needs to be performed according to specific service requirements.

Step 2.2, automatic learning training

Auto-learning training typically involves two processes, search and evaluation. In the searching process, a searcher is used for finding out suboptimal solutions which possibly meet requirements in a huge searching space, and commonly used searchers generally comprise an evolution algorithm, a Monte Carlo search tree, a Bayes searcher, reinforcement learning and the like. And the evaluation process is responsible for carrying out model training according to the searched alternative schemes to obtain corresponding evaluation indexes. This step is often time consuming and also encourages accelerated techniques such as sharing weights, index prediction, etc. The searching and evaluating processes are iterated continuously until the index function of the algorithm reaches a required threshold value or the iteration space reaches a maximum value.

In the specific implementation, the Auto-PyTorch based on the PyTorch framework is used for model automatic parameter adjustment, and the method mainly includes the following processes that preprocessing operations such as segmentation and coding are performed on the enhanced labeled data, so that the labeled data can be processed by the framework of the model training algorithm, and the data is subjected to ten-time cross validation or K-fold cross validation (automatic model training); then, a preliminary evaluation base line is made according to the existing model or an open-source general model, and iterative training is carried out by using the fixed super parameter configuration and the existing model with poor performance, such as sklern. Then, the consumption of each iterative training, the maximum resource consumption value and an ending rule are formulated, and the model is subjected to repeated iteration; and finally, checking the performance of the model and the neural network structure after the model converges and forms an integrated model.

And 3, distilling the training completion model, taking a model which is automatically trained through the model and has good performance but large parameter quantity as a teacher model, constructing a plurality of student models by using a small-scale neural network, transferring parameter knowledge contained in the teacher model into the student models through knowledge distillation, and finally integrating the student models through model integration.

And 4, calling service encapsulation, wherein the service encapsulation is mainly based on the model trained by the algorithm model for service encapsulation, the module determines the required original model or distillation model according to the user requirements, and the preset module is used for service encapsulation of the model. The module also provides multi-copy deployment and high-load calling of the REST service, and can support fine-grained safe sharing of various services in a project and overall scheduling and on-demand adjustment of high-performance computing resources. A typical application flow of a service encapsulation call is shown in figure 3. The module faces service calling personnel, service is packaged and issued based on an intelligent model produced by algorithm modeling personnel, and the service calling personnel use a uniform interface to execute a service application task or perform secondary development by combining a service system. The specific process comprises the following steps:

firstly, carrying out model screening by service calling personnel according to business requirements based on registered models;

invoking a dependent mirror image to create a container, and packaging the model into REST service;

providing environment support and resources required by intelligent service operation according to the model requirement;

and fourthly, the service transfer personnel use the uniform interface to execute the service application task or carry out secondary development by combining the service system.

The business application reduces the components of manual marking and model training intervention in the business processing flow through the integrated intelligent processing service, improves the automation degree of interaction between the business application and a user, with an application environment and with other applications, strengthens the business processing capacity, and can continuously iterate learning according to application feedback to improve the model capacity.

The present invention provides a data-driven automatic model training and application system, and the method and the way for implementing the technical solution are many, and the above description is only a specific embodiment of the present invention, it should be noted that, for those skilled in the art, many modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A data-driven automatic model training and application system is characterized by comprising a data enhancement layer, a model automatic learning training layer, a model distillation layer and a service distribution layer;

2. The system of claim 1, wherein the annotation data in the data enhancement layer comprises heterogeneous data of different types: text data, image data, and formatted data; the text data enhancement comprises configuring a data enhancement strategy aiming at the marked text, carrying out deformation conversion based on the data enhancement strategy and automatically generating a marked corpus;

3. The data-driven automated model training and application system of claim 2, wherein the data enhancement strategy for annotated text comprises entity replacement, synonym replacement, and translation of text,

4. The system of claim 3, wherein the model auto-learning training layer combines historical model training effect information and experience knowledge of algorithm developers to form a mapping network between the business scenarios, the model training algorithms, and the frames used by the model training algorithm implementation, wherein the mapping network is capable of associating the business scenarios with the model training algorithms to find different candidate sets of model training algorithms for different business scenarios, and wherein the same model training algorithm is implemented with different frames; when a user selects a task requirement and label data, automatically selecting a matched model training algorithm and an implementation frame according to a mapping network, wherein the matched model training algorithm comprises more than two deep learning models, and one deep learning model corresponds to one implementation frame; and automatically selecting and optimizing model network parameters through a matched model training algorithm to obtain a trained model.

5. The system of claim 4, wherein the automatic model network parameter selection and tuning performed by the matched model training algorithm comprises:

6. The system of claim 5, wherein the searching for model network parameters in a search space based on a search strategy comprises:

7. The system of claim 6, wherein the model training is used to evaluate the results of the model network parameters to obtain the network performance evaluation indicators, wherein the network performance evaluation indicators comprise accuracy, recall, F1 value, ROC curve, and LOSS curve: the evaluation process comprises the steps of carrying out model training according to the searched suboptimal solution and obtaining a corresponding evaluation index; and continuously iterating the searching process and the evaluating process until a solution meeting the constraint is found, so as to obtain the trained model.

8. The system of claim 7, wherein the model distilling layer takes the trained model as a teacher model, constructs two or more student models using a neural network with a scale of less than 10% of the teacher model, transfers parameter knowledge contained in the teacher model to the student models by knowledge distillation, and finally integrates the student models by model integration to obtain the model after compression distillation.

9. The system of claim 8, wherein the service distribution layer is capable of automatically selecting a trained model or a compressed distilled model to package into an intelligent service according to a user request and a performance index requirement for the service; the managing the published online model service comprises:

10. The system of claim 9, wherein after the trained model or the compressed and distilled model is automatically selected and encapsulated into an intelligent service, a service caller performs a service application task or performs secondary development in combination with a service system, comprising: