CN115618239B

CN115618239B - Management method, system, terminal and medium for deep learning framework training

Info

Publication number: CN115618239B
Application number: CN202211617170.5A
Authority: CN
Inventors: 何正阳; 景志斌; 陈果累; 易云宇
Original assignee: Sichuan Kingscheme Information Technology Co ltd
Current assignee: Sichuan Kingscheme Information Technology Co ltd
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-04-11
Anticipated expiration: 2042-12-16
Also published as: CN115618239A

Abstract

The invention discloses a management method, a system, a terminal and a medium for deep learning framework training, which relate to the field of deep learning framework management and have the technical scheme that: obtaining source files corresponding to a plurality of deep learning frames; decoding a configuration file corresponding to the source file, and obtaining a configuration item of the configuration file according to a decoding result of the configuration file; decoding an instruction execution file corresponding to the source file, and obtaining an instruction combination set according to a decoding result of the instruction execution file; decoding a training environment file corresponding to the source file, and obtaining a program dependence package for constructing the training environment file according to the decoding result of the training environment file; storing source codes of source files of the deep learning framework into a code database, and acquiring download addresses of the source codes in the code database; and binding the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes of the source files with the corresponding deep learning framework, and storing the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes in the data sharing library.

Description

Management method, system, terminal and medium for deep learning framework training

Technical Field

The present invention relates to the field of deep learning framework management, and more particularly, to a management method, system, terminal and medium for deep learning framework training.

Background

In the technical field of deep learning model training, corresponding deep learning models can be trained based on various deep learning frames, although the various deep learning frames are different, basic processes of the deep learning models are consistent, and the deep learning models can be trained through the deep learning frames, such as steps of data processing, parameter setting, model training, model derivation and the like.

In the related technology, due to the lack of effective management of each deep learning framework, the basic flow in the deep learning model training process is not standard, so that the training efficiency of the deep learning model is low, and a repeated operation process exists in the deep learning model training adapted to different training scenes.

Therefore, how to solve the management of the basic processes for multiple deep learning frameworks in the related art is an urgent problem to be solved at present.

Disclosure of Invention

The invention aims to solve the problem of management of basic processes of a plurality of deep learning frames in the related technology and provides a management method, a system, a terminal and a medium for deep learning frame training.

The technical purpose of the invention is realized by the following technical scheme:

in a first aspect of the present application, a management method for deep learning framework training is provided, where the method includes:

obtaining source files corresponding to a plurality of deep learning frames;

decoding a configuration file corresponding to the source file, and obtaining a configuration item of the configuration file according to a decoding result of the configuration file, wherein the configuration file comprises a data configuration file and a weight configuration file;

decoding an instruction execution file corresponding to the source file, and obtaining an instruction combination set according to a decoding result of the instruction execution file;

decoding a training environment file corresponding to the source file, and obtaining a program dependence package for constructing the training environment file according to the decoding result of the training environment file;

storing source codes of source files of the deep learning framework into a code database, and acquiring download addresses of the source codes in the code database;

and binding the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes of the source files with the corresponding deep learning framework, and storing the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes in the data sharing library.

In some possible embodiments, the data profile includes at least any one, or any two, or any three, or all of the formats of a training set, a validation set, a test set, and a data population;

the weight profile includes a weight file, training iteration number, weight update image amount, and input image size.

In some possible embodiments, the instruction set includes an execution environment, a pre-execution command, a command file, and a post-parameter command, where the pre-execution command is generated according to the execution environment, the command file to be executed is specified according to the pre-execution command, and the pre-execution command or the command file is constrained according to the post-parameter.

In some possible embodiments, a program dependent package for constructing the training environment file is obtained according to the decoding result of the training environment file, a mirror image of the training environment is established and stored in a training environment shared library, and a download address of the mirror image is generated, wherein the training environment shared library is a virtual mirror library based on the training environment, and the virtual mirror library can derive the mirror image of the corresponding training environment.

In some possible embodiments, the method further comprises:

creating one or more deep learning model training tasks on one or more terminals, and matching corresponding deep learning frames according to the deep learning models;

loading training data according to the training task, and configuring training parameters according to the training data;

acquiring memory resources of one or more terminals, dynamically evaluating the memory resources based on the size of the memory resources occupied by the existing training tasks, the size of the remaining memory resources of the current terminal and the size of the memory resources required by the created training tasks, distributing the created training tasks to each terminal according to evaluation results, and synchronizing the training environment and the training data required by the training tasks and the source codes of a deep learning framework matched with the training tasks for the terminals distributed with the training tasks, wherein the training data needs to be synchronized in real time during each training, and the training environment and the source codes of the deep learning framework only need to be synchronized once before the training tasks are not changed;

and when the training task is finished, triggering a model export instruction to export the trained deep learning model.

In some possible embodiments, the loading training data according to the training task specifically includes:

acquiring basic data, processing the basic data based on a data enhancement mechanism to obtain training data, and taking the training data as data to be labeled, wherein the data enhancement comprises random inversion, random cutting and random erasure;

marking the data to be marked by a preset auxiliary marking strategy, and generating a marking file according to a marking result of the data to be marked; when the data to be annotated is identified to be image data, annotating the data to be annotated based on image segmentation annotation; when the data to be labeled is recognized as voice data, labeling the data to be labeled based on the voice frame sequence analysis; and when the data to be marked is identified to be video data, marking the data to be marked based on the video frame sequence analysis.

In some possible embodiments, data of the trained deep learning model, which appears in the actual application scene and has a recognition rate lower than a recognition rate threshold value, is collected and derived, after the collected data reaches a quantity threshold value, a deep learning model retraining mechanism is started, the collected data and training data are retrained again, and the trained deep learning model is derived.

In a second aspect of the present application, a management system for deep learning framework training is provided, including:

the source file acquisition module is used for acquiring source files corresponding to a plurality of deep learning frames;

the first decoding module is used for decoding the configuration file corresponding to the source file and obtaining the configuration items of the configuration file according to the decoding result of the configuration file, wherein the configuration file comprises a data configuration file and a weight configuration file;

the second decoding module is used for decoding the instruction execution file corresponding to the source file and obtaining an instruction combination set according to the decoding result of the instruction execution file;

the third decoding module is used for decoding the training environment file corresponding to the source file and obtaining a tool kit for constructing the training environment file according to the decoding result of the training environment file;

the address acquisition module is used for storing the source code of the source file of the deep learning frame in the code database and acquiring the download address of the source code in the code database;

and the data information binding module is used for binding the configuration items, the instruction combination set, the tool kit and the download addresses of the source codes of the source files with the corresponding deep learning framework and storing the configuration items, the instruction combination set, the tool kit and the download addresses of the source codes in the data sharing library.

In a third aspect of the present application, there is provided a computer terminal comprising: a memory and a processor, the memory having stored thereon a computer program executable by the processor to cause the processor to implement a method of managing deep learning framework training as claimed in any one of the first aspects of the present application.

In a fourth aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of a management method for deep learning framework training as described in any one of the first aspect of the present application.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, the source files of the deep learning frames are obtained, and the data of different dimensions in the source files of the deep learning frames are analyzed in a decoding mode, so that the information extraction of different dimensions in the source files of the deep learning frames is completed, the data of each dimension is bound with the corresponding deep learning frame, and the multiplexing capability of the deep learning frames is improved, so that a user focuses on parameter configuration and optimization in the process of using the deep learning frames, instead of building a training environment, frequently executing commands and converting training data, the training difficulty of the deep learning models can be effectively reduced, and the speed of the standardized integration of the deep learning models is improved.

2. The invention further considers how to improve the effectiveness of training data acquisition, provides a data preprocessing and labeling mechanism, effectively reduces the manual error in the training data circulation and improves the efficiency of the training data circulation.

3. The invention also provides a deep learning model identification retraining mechanism, which is characterized in that the identification rate of the trained deep learning model in the actual application scene is collected and derived to be lower than the identification rate threshold, after the collected data reaches the quantity threshold, the deep learning model retraining mechanism is started, the collected data and the training data are retrained again, and the trained deep learning model is derived, so that the identification precision of the deep learning model is effectively improved, and the generalization capability and the robustness of the deep learning model are enhanced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic flowchart of a management method for deep learning framework training according to an embodiment of the present disclosure;

FIG. 2 is a schematic flowchart of deep learning model training provided in an embodiment of the present application;

fig. 3 is a block diagram of a management system for deep learning framework training according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and the accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limiting the present invention.

It is to be understood that the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Therefore, in order to solve the defects in the deep learning model training process in the related art, the embodiment of the application provides a management method, a system, a terminal and a medium for deep learning frame training, which solve the problem of management of basic processes of a plurality of deep learning frames in the related art, so that a user focuses more on parameter configuration and optimization in the process of using the deep learning frames instead of building a training environment, executing frequent commands and transferring training data, the training difficulty of the deep learning model can be effectively reduced, and the speed of the normalized integration of the deep learning model is increased.

Referring to fig. 1, fig. 1 is a schematic flowchart of a management method for deep learning framework training according to an embodiment of the present disclosure, and as shown in fig. 1, the method includes:

s110, obtaining source files corresponding to a plurality of deep learning frames.

In this embodiment, as those skilled in the art will understand, each deep learning frame is configured with a corresponding source file, so that a subsequent user can match with training data, a training environment, command execution, and the like corresponding to the source file when using the deep learning frame, and therefore, when using by other people, the user needs to spend a long time to be familiar with the source code of the corresponding deep learning frame, and build the training environment of the deep learning module, which is inconvenient to use. Therefore, in this embodiment, the source files corresponding to the multiple deep learning frames are obtained, that is, each deep learning frame obtains a corresponding source file, so that the source files are analyzed and processed subsequently.

And S120, decoding the configuration file corresponding to the source file, and obtaining a configuration item of the configuration file according to a decoding result of the configuration file, wherein the configuration file comprises a data configuration file and a weight configuration file.

In this embodiment, the configuration files of the obtained multiple source files are parsed based on a source file parser, that is, decoded to obtain configuration items of the configuration files, and accordingly, as will be understood by those skilled in the art, the configuration files of each deep learning framework include a data configuration file and a weight configuration file, for example, the data configuration file includes at least any one, or any two, or any three, or all of training sets, validation sets, test sets, and data stuffing formats; illustratively, the weight profile includes a weight file, a number of training iterations, a weight update image amount, and an input image size. As will be understood by those skilled in the art, during the training of the model, only the training set and the verification set may be used for the data configuration file, the format of the test set and the data filling may not be adopted, and accordingly, for the training of the weight configuration file, for example, the electric power data prediction model, only the corresponding training iteration number and the weight file need to be configured, which does not need to update the image quantity and the input image size by the weights like the training of the image recognition model, so the present embodiment does not limit the types included in the data configuration file and the weight configuration file, and can make corresponding selections according to the requirements of the actual model training.

It should also be understood that the data profile and the weight profile obtained from the decoding operation may be modified accordingly, for example, the number of training iterations and the size of the input image in the weight profile, and the number and ratio of the training set to the test set in the data profile may be modified, as will be apparent to those skilled in the art.

S130, decoding the instruction execution file corresponding to the source file, and obtaining an instruction combination set according to the decoding result of the instruction execution file.

In this embodiment, based on the principle equivalent to the above-described embodiment of step S130, the instruction execution files of the obtained multiple source files are parsed based on the source file parser, and for example, the instruction execution files include a training execution file, a model export execution file, a recognition execution file, and the like, so as to obtain an instruction combination set of the instruction execution files. As will be understood by those skilled in the art, each instruction execution file of the deep learning framework has a corresponding instruction combination set, and the instruction combination set exemplarily includes an execution environment, a pre-execution command, a command file, and a post-parameter command, where the pre-execution command is generated according to the execution environment, the command file to be executed is specified according to the pre-execution command, and the pre-execution command or the command file is constrained according to the post-parameter. It should be noted that the order of the instruction combination set is an execution environment, a pre-execution command, a command file, and a post-parameter command. The four commands based on the instruction combination set can realize operations of training a model, exporting the model, testing the model, converting data, mirroring operation and the like. Of course, other existing operations may also be implemented based on the instruction combination set, and redundant description is not repeated here.

S140, decoding the training environment file corresponding to the source file, and obtaining a program dependence package for constructing the training environment file according to the decoding result of the training environment file.

In this embodiment, a training environment file in a deep learning frame source file is parsed. The program dependent packages of the versions required by the deep learning framework and corresponding to the deep learning framework are automatically downloaded, and local uploading of the program dependent packages of the training environment can also be supported. Obtaining a program dependence package for constructing the training environment file according to the decoding result of the training environment file, establishing a mirror image of the training environment, storing the mirror image into a training environment shared library, and generating a downloading address of the mirror image, wherein the training environment shared library is a virtual mirror image library based on the training environment, and the virtual mirror image library can derive the mirror image of the corresponding training environment, so that the training environment can be synchronized when a plurality of devices are trained.

S150, storing the source code of the source file of the deep learning frame in a code database, and acquiring the download address of the source code in the code database.

In this embodiment, the source code of the source file of the deep learning framework is stored in the code synchronization library, and the code download address is obtained. For the implementation of frame code sharing, based on a code hosting center in a local area network, codes of all integrated deep learning frames are managed in a centralized mode, warehouse addresses of the codes are stored, and when a plurality of devices are trained in a combined mode, the codes can be obtained through the warehouse addresses of the codes to achieve synchronization.

And S160, binding the configuration item, the instruction combination set, the program dependence package and the download address of the source code of the source file with the corresponding deep learning framework, and storing the configuration item, the instruction combination set, the program dependence package and the download address of the source code in the data sharing library.

And binding the acquired configuration item, the execution combination set, the downloading address of the mirror image, the code downloading address and the corresponding basic information of the deep learning framework, and storing the basic information to the data sharing center. For the data sharing center containing data before training, data generated during training and model data after training, only files or directories which change are synchronized during synchronization based on a file synchronization mechanism, and the data which change each time is very small for the data of the whole synchronization directory. For example: and when the training task is received currently, synchronous operation is carried out according to the trigger files corresponding to the server distribution nodes so as to obtain training data.

Based on the operation of the embodiment, the extraction of the information with different dimensions in the source file of the deep learning framework can be completed. In the process, a deep learning framework source file resolver is innovatively used, data of all dimensions are flexibly managed, a multiplexing value is generated, and the labor cost in the multiplexing process is reduced. The user can focus on parameter configuration and optimization rather than environment building, frequent command execution and data circulation in the process of using the deep learning framework.

It should be understood that the above steps S120-S150 do not have a sequential order of decoding the source file. It should be further noted that, the decoding operations in the above four steps are to traverse the source file by using the source file parser to extract corresponding key words and key files, so as to obtain the required configuration items, instruction combination sets, program dependent packages, and download addresses of the source codes, and unlike the existing source file parser, this embodiment further provides a function of a combination command model, that is, the instruction combination set obtained in the above step S130, or the configuration item obtained in the step S120, except for the extraction operation. Therefore, the management method provided by the embodiment of the application solves the problems of low efficiency, low integration level, repeated command execution and inflexible switching of each deep learning framework when the traditional deep learning framework trains the deep learning model.

Further, as should be apparent to those skilled in the art, the management method for deep learning framework training provided in the embodiment of the present application may further implement visual interaction of data, for example, functions such as data acquisition, data labeling, configuration item management, training task management, deep learning framework source file management, authority control, training log monitoring, model export, and the like, that is, display of visual interaction is performed on a terminal device, which belongs to a conventional technical means of those skilled in the art, and therefore, how to implement visual interaction is not described herein in detail.

Referring to fig. 2, fig. 2 is a schematic flowchart of deep learning model training provided in the embodiment of the present application, and as shown in fig. 2, the method further includes:

s210, creating one or more deep learning model training tasks on one or more terminals, and matching corresponding deep learning frames according to the deep learning models.

In this embodiment, the terminal may be a notebook computer or a desktop computer, and is not limited herein. The training task for creating the deep learning model belongs to conventional operation and is not redundant in description, and it is understood that the deep learning model is trained based on the existing deep learning framework, so that the precondition for training one deep learning model is that the optional deep learning framework is provided, and the deep learning framework needs to be specified during creation and used as subsequent steps for loading training data and training configuration guidance.

S220, loading training data according to the training task, and configuring training parameters according to the training data.

In this embodiment, corresponding training data is loaded according to a deep learning model corresponding to a training task, for example, corresponding image data needs to be configured for the deep learning model related to image recognition, corresponding historical electric power data needs to be configured for the prediction model related to electric power data, and corresponding voice data needs to be configured for the voice recognition model, which can be understood by those skilled in the art, the configuration mode can be implemented based on operation modes such as data acquisition and labeling, for example, basic data is obtained, the basic data is processed based on a data enhancement mechanism to obtain the training data, and the training data is used as data to be labeled, where the data enhancement includes random inversion, random clipping and random erasure. For example, rotating, cropping, enlarging, reducing, etc. the underlying data for a picture type expands the number of existing pictures to meet the order of magnitude required for training. The training picture set is synchronized to the data sharing library, and the accessible address is generated, so that the manual error in data circulation is effectively reduced, and the efficiency of data circulation is effectively improved.

Marking the data to be marked by a preset auxiliary marking strategy, and generating a marking file according to a marking result of the data to be marked; when the data to be annotated is identified to be image data, annotating the data to be annotated based on image segmentation annotation; when the data to be labeled is recognized as voice data, labeling the data to be labeled based on voice frame sequence analysis; and when the data to be marked is identified to be video data, marking the data to be marked based on the video frame sequence analysis. And realizing data annotation, including marks of image segmentation of points, lines, frames and polygons, and also including voice and video annotation. In the prior art, data tagging includes two processing modes, namely manual tagging and auxiliary tagging, and manual tagging in the conventional technology is not repeated. The auxiliary labeling innovatively uses a model identification driving labeling mechanism, namely a deep learning model is obtained by training small-batch data, and the residual data to be labeled are identified and predicted based on the model, so that the labeling work of a large amount of data can be circularly processed. The auxiliary annotation adopts different processing strategies aiming at different data types, and the auxiliary annotation has three strategies in total, wherein the first strategy is that a target detection mode is adopted for the image. The second type of voice annotation requires framing and then recognition, and the third type of voice annotation requires framing and then recognition. And after the three steps are completed, generating a label file from the data subjected to auxiliary identification and storing the label file.

And S230, acquiring the memory resources of one or more terminals, dynamically evaluating the memory resources occupied by the existing training tasks, the residual memory resources of the current terminal and the memory resources required by the created training tasks, distributing the created training tasks to the terminals according to the evaluation result, and synchronizing the training environment and the training data required by the training tasks and the source codes of the deep learning frame matched with the training tasks for the terminals distributed with the training tasks, wherein the training data needs to be synchronized in real time during each training, and the source codes of the training environment and the deep learning frame only need to be synchronized once before the training tasks are not changed.

In this embodiment, it should be understood that, during the training of the image model, resources of the image processor GPU in the terminal device, that is, memory resources described in this embodiment, are required to be occupied, which is common knowledge of those skilled in the art, and therefore, redundant description is not made. It should be understood here that for terminals with less or insufficient memory resources, no training tasks may be distributed. Because the training tasks of the deep learning model created in S210 all match the corresponding deep learning frames, in the above embodiment, each deep learning frame binds the configuration item, the instruction combination set, the program dependency package, and the download address of the source code of the source file, and then stores them in the data sharing library, therefore, this embodiment only needs to synchronize the training environment and the training data required by the training task of this time and the source code of the deep learning frame matched with the training task of this time in the data sharing library, and after the synchronization operation is completed, the instruction to start training can be reached on the terminal.

And S240, triggering a model export instruction to export the trained deep learning model when the training task is finished.

In this embodiment, when the training task is finished, the training finishing operation may be performed, the model optimization derivation mechanism is triggered when the training is finished, the API call packet and the visual model verification interface are generated by default, and test data may be uploaded to perform the test of the model precision. And archiving the access link of the log data and the generated model file access link, and binding the training task information.

Based on the embodiment of deep learning module training, the embodiment manages data by the dimensionality of the training task, has better data isolation capability, can realize full-life-cycle management of data generated from the beginning of model training to the end of training, and maximizes the output of a high-precision deep learning model.

In one embodiment, data of which the recognition rate is lower than the recognition rate threshold value and appears in the practical application scene of the trained deep learning model are collected and derived, after the collected data reach the quantity threshold value, a deep learning model retraining mechanism is started, the collected data and the training data are trained again, and the trained deep learning model is derived.

In this embodiment, model optimization is performed on the basis of the derived deep learning model, including an operation of minimizing removal of redundant parts, data of which the recognition rate is lower than a recognition rate threshold value in an actual application scene is collected and derived based on a model recognition retraining mechanism, the deep learning model retraining mechanism is started after the collected data reaches a quantity threshold value, the collected data and training data are trained again, and the trained deep learning model is derived, so that the recognition accuracy of the deep learning model is effectively improved, and the generalization ability and robustness of the model are enhanced.

Referring to fig. 3, fig. 3 is a block diagram of a management system for deep learning framework training according to an embodiment of the present application, and as shown in fig. 3, the system includes:

a source file obtaining module 310, configured to obtain source files corresponding to multiple deep learning frames;

the first decoding module 320 is configured to decode a configuration file corresponding to the source file, and obtain a configuration item of the configuration file according to a decoding result of the configuration file, where the configuration file includes a data configuration file and a weight configuration file;

the second decoding module 330 is configured to decode the instruction execution file corresponding to the source file, and obtain an instruction combination set according to a decoding result of the instruction execution file;

the third decoding module 340 is configured to decode the training environment file corresponding to the source file, and obtain a program dependent package for constructing the training environment file according to the decoding result of the training environment file;

an address obtaining module 350, configured to store a source code of a source file of the deep learning framework in a code database, and obtain a download address of the source code in the code database;

and the data information binding module 360 is used for binding the configuration items, the instruction combination set, the program dependency package and the download addresses of the source codes of the source files with the corresponding deep learning framework and storing the configuration items, the instruction combination set, the program dependency package and the download addresses of the source codes in the data sharing library.

It can be seen that, according to the management system for deep learning frame training provided by the above embodiment, the source files of the deep learning frames are obtained, and the data of different dimensions in the source files of the deep learning frames are analyzed in a decoding manner, so that the extraction of information of different dimensions in the source files of the deep learning frames is completed, and the data of each dimension is bound with the corresponding deep learning frame, so that the multiplexing process of the deep learning frames is reduced, and a user focuses on parameter configuration and optimization in the process of using the deep learning frames instead of building a training environment, frequent command execution and training data circulation, so that the training difficulty of the deep learning model can be effectively reduced, and the speed of the normalized integration of the deep learning model is increased.

The embodiment of the application also provides a computer terminal, which comprises one or more processors; a memory coupled to the processor for storing one or more programs; when executed by the one or more processors, the one or more programs cause the one or more processors to implement the steps of the management method for deep learning framework training described in the above embodiments. The processor may be a central processing unit, or may be other general purpose processor, a digital signal processor, an application specific integrated circuit, an off-the-shelf programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and in particular, to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for executing the operation of the management method of the deep learning framework training.

In yet another embodiment of the present invention, a readable storage medium, specifically a computer-readable storage medium (Memory), is provided, and the computer-readable storage medium is a Memory device in a computer device for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer readable storage medium may be a high speed RAM memory, or may be a non-volatile memory, such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the management method related to deep learning framework training in the above embodiments. As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A management method for deep learning framework training, which is characterized by comprising the following steps:

obtaining source files corresponding to a plurality of deep learning frames;

binding the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes of the source files with the corresponding deep learning frames, and storing the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes in a data sharing library;

the method further comprises the following steps: creating one or more deep learning model training tasks on one or more terminals, and matching corresponding deep learning frames according to the deep learning models;

2. The method for managing deep learning framework training according to claim 1, wherein the data configuration file at least comprises any one, any two, or any three or all of formats of a training set, a validation set, a test set and data padding;

the weight profile includes a weight file, a number of training iterations, a weight update image amount, and an input image size.

3. The method of claim 1, wherein the instruction set comprises an execution environment, a pre-execution command, a command file, and a post-parameter command, wherein the pre-execution command is generated according to the execution environment, the command file to be executed is specified according to the pre-execution command, and the pre-execution command or the command file is constrained according to the post-parameter.

4. The method as claimed in claim 1, wherein a program dependency package for constructing the training environment file is obtained according to the decoding result of the training environment file, a mirror image of the training environment is created and stored in a training environment shared library, and a download address of the mirror image is generated, wherein the training environment shared library is a virtual mirror library based on the training environment, and the virtual mirror library can derive a mirror image of the corresponding training environment.

5. The management method for deep learning framework training according to claim 1, wherein the loading of training number data according to the training task specifically comprises:

acquiring basic data, processing the basic data based on a data enhancement mechanism to obtain training data, and taking the training data as data to be marked, wherein the data enhancement comprises random inversion, random cutting and random erasure;

6. The management method for deep learning framework training according to claim 1, wherein data for deriving the trained deep learning model that the recognition rate is lower than the recognition rate threshold value in an actual application scene is collected, after the collected data reaches the quantity threshold value, a deep learning model retraining mechanism is started, the collected data and the training data are retrained again, and the trained deep learning model is derived.

7. A management system for deep learning framework training, comprising:

the source file acquisition module is used for acquiring source files corresponding to the deep learning frames;

the third decoding module is used for decoding the training environment file corresponding to the source file and obtaining a program dependency package for constructing the training environment file according to the decoding result of the training environment file;

the address acquisition module is used for storing the source code of the source file of the deep learning frame into the code database and acquiring the download address of the source code in the code database;

the data information binding module is used for binding the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes of the source files with the corresponding deep learning framework and storing the configuration items, the instruction combination set, the program dependence package and the download addresses of the source codes in the data sharing library;

the system further comprises: the training module is used for creating training tasks of one or more deep learning models on one or more terminals and matching corresponding deep learning frames according to the deep learning models; loading training data according to the training task, and configuring training parameters according to the training data; acquiring memory resources of one or more terminals, dynamically evaluating the memory resources based on the size of the memory resources occupied by the existing training tasks, the size of the remaining memory resources of the current terminal and the size of the memory resources required by the created training tasks, distributing the created training tasks to each terminal according to evaluation results, and synchronizing the training environment and the training data required by the training tasks and the source codes of a deep learning framework matched with the training tasks for the terminals distributed with the training tasks, wherein the training data needs to be synchronized in real time during each training, and the training environment and the source codes of the deep learning framework only need to be synchronized once before the training tasks are not changed; and when the training task is finished, triggering a model export instruction to export the trained deep learning model.

8. A computer terminal, comprising: memory and a processor, the memory having stored thereon a computer program, the computer program being executable by the processor to cause the processor to implement a method of managing deep learning framework training as claimed in any one of claims 1 to 6.

9. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of a method for managing deep learning framework training according to any one of claims 1 to 6.