CN110533181B

CN110533181B - Rapid training method and system for deep learning model

Info

Publication number: CN110533181B
Application number: CN201910676874.1A
Authority: CN
Inventors: 赵铭; 林镇锋; 易文峰; 杨育; 杨正刚; 李小芬; 徐文娟
Original assignee: China Southern Power Grid Digital Platform Technology Guangdong Co ltd
Current assignee: China Southern Power Grid Digital Platform Technology Guangdong Co ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2023-07-18
Anticipated expiration: 2039-07-25
Also published as: CN110533181A

Abstract

A method and a system for quickly training a deep learning model. The rapid training method comprises the following steps: pre-installing a training environment in a training server; the training server invokes training codes which are pre-led into the file server from the file server to train the deep learning model; and connecting the file server with a display terminal displaying a control interface, wherein the control interface comprises a training code updating tab. Thus, since the training environment is preinstalled in the training server, the installation and preparation of the environment are not required when training the deep learning model, and the working efficiency can be greatly improved compared with the original mode (a minimum of 7 days is required for one ordinary skilled person to install and deploy the environment). In addition, when the training server needs to be updated, the latest training code coverage operation can be automatically downloaded in the file server, so that the training code is updated rapidly.

Description

Rapid training method and system for deep learning model

Technical Field

The invention relates to the technical field of information, in particular to a method and a system for quickly training a deep learning model.

Background

Deep learning is a new field in machine learning research, and the motivation is to build and simulate a neural network for analysis learning of the human brain, which mimics the mechanism of the human brain to interpret data such as image, sound and text. Deep learning models, such as convolutional neural networks (Convolutional Neural Network, CNN), require extensive data training to be practical. According to the model training of the self-defined deep learning in the past, the following steps are needed: writing codes, installing an environment, performing local test operation, performing server operation, adjusting parameters of the server codes, and tracking training results. It takes a lot of time and effort to install the environment, adjust the parameters, and track the training results each time. How to accelerate the training process of deep learning models has become an urgent need in the industry.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a rapid training method of a deep learning model in order to improve the training efficiency and reduce the training time of the deep learning model, and the environment construction, the code updating and the parameter adjustment can be processed in one interface by the rapid training method, so that the adjustment can be completed in a few minutes each time, and the working efficiency is greatly improved.

The technical scheme adopted for solving the technical problems is as follows: the quick training method for the deep learning model comprises the following steps:

pre-installing a training environment in a training server;

the training server invokes training codes which are pre-led into the file server from the file server to train the deep learning model;

and connecting the file server with a display terminal displaying a control interface in a signal way, wherein the control interface comprises a training code updating tab.

In the rapid training method of the deep learning model provided by the invention, training data generated in the training process of the deep learning model is stored in the file server, and the file server displays the training data through the control interface.

In the rapid training method of the deep learning model provided by the invention, the log generated in the training process of the deep learning model is stored in the file server, and the file server displays the log through the control interface.

In the rapid training method of the deep learning model provided by the invention, the control interface further comprises a training parameter adjustment tab, and the updated training parameters are transmitted to the training server through the message received by the message queue.

In the rapid training method of the deep learning model provided by the invention, the training parameter adjustment tab comprises a learning rate adjustment tab and an iteration number tab.

In the rapid training method of the deep learning model provided by the invention, relevant dependent components or software are provided in the pre-installed training environment.

In the rapid training method of the deep learning model provided by the invention, the rapid training method comprises the step of directly carrying out self-definition through the control interface when relevant dependent components or software are absent in the preloaded training environment.

In the method for quickly training a deep learning model provided by the invention, the step of directly customizing the deep learning model through the control interface when relevant dependent components or software are absent in the preloaded training environment comprises the following steps:

the training server automatically generates a dockerfile according to the existing dependent component and automatically downloads the related dependent component through the dockerfile;

adding the base image to the new dependent components generates a custom running environment for the user.

In order to solve the technical problem, the invention also provides a rapid training system of the deep learning model, which comprises a training server, a file server in signal connection with the training server and a display terminal connected with the file server, wherein the training server is preloaded with a training environment, training codes are stored in the file server, and a control interface is displayed on the display terminal and comprises a training code updating tab.

In the rapid training system of the deep learning model provided by the invention, the control interface also comprises a training parameter adjustment tab, and the updated training parameters are transmitted to the training server through the message received by the message queue; the training parameter adjustment tab includes a learning rate adjustment tab and an iteration number tab.

By implementing the rapid training method of the deep learning model, the following beneficial effects can be achieved:

1. the rapid training method comprises the following steps: pre-installing a training environment in a training server; the training server invokes training codes which are pre-led into the file server from the file server to train the deep learning model; and connecting the file server with a display terminal displaying a control interface in a signal way, wherein the control interface comprises a training code updating tab. Thus, since the training environment is preinstalled in the training server, the installation and preparation of the environment are not required when training the deep learning model, and the working efficiency can be greatly improved compared with the original mode (a minimum of 7 days is required for one ordinary skilled person to install and deploy the environment). In addition, the training codes are uploaded to the training server through the file server, if the training codes need to be updated, the new training codes are directly uploaded to the file server, and the training server can automatically download the latest training codes in the file server for covering and running, so that the updating of the training codes is completed rapidly.

2. Training data generated in the deep learning model training process are stored in the file server, and the file server displays the training data through the control interface. In this way, the training process can be tracked through the control interface without viewing in the training server.

3. And the log generated in the deep learning model training process is stored in the file server, and the file server displays the log through the control interface. Therefore, the log generated in the training process can be directly displayed in the control interface, and is convenient to read.

4. The control interface also comprises a training parameter adjustment tab, and the updated training parameters are transmitted to the training server through the information received by the information queue. Preferably, the training parameter adjustment tab includes a learning rate adjustment tab and an iteration number tab. In this way, the parameter adjustment in the training process can be completed by operating the training code updating tab card in the control interface, and each time the parameter is adjusted, the parameter adjustment can be completed in 5 minutes, and compared with the original mode (the minimum time for each adjustment in the prior art is 1 hour), the working efficiency can be greatly improved.

5. And when relevant dependent components or software are absent in the preloaded training environment, directly performing the self-defining step through the control interface. The step of directly customizing the control interface when the relevant dependent components or software are absent in the preloaded training environment comprises the following steps: the training server automatically generates a dockerfile according to the existing dependent component and automatically downloads the related dependent component through the dockerfile; adding the base image to the new dependent components generates a custom running environment for the user. Compared with the original environment modification technology which needs to take 2 days at least, the environment is customized by the process only by a few minutes, so that the working efficiency is greatly improved.

Correspondingly, the rapid training system of the deep learning model comprises a training server, a file server in signal connection with the training server and a display terminal connected with the file server, wherein a training environment is preloaded in the training server, training codes are stored in the file server, a control interface is displayed on the display terminal, and the control interface comprises a training code updating tab. It can be seen that the rapid training system corresponds to the rapid training method, and thus, the same technical effects can be achieved by implementing the rapid training system.

Drawings

FIG. 1 is a flow chart showing steps of a fast training method according to a first embodiment of the present invention;

FIG. 2 is a diagram of an updated training parameter interface of a control interface according to a first embodiment of the present invention;

FIG. 3 is a customized interface of the control interface according to a first embodiment of the present invention;

fig. 4 is a block diagram of a rapid training system according to a second embodiment of the present invention.

Detailed Description

For a clearer understanding of technical features, objects and effects of the present invention, a detailed description of embodiments of the present invention will be made with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment provides a rapid training method of a deep learning model, referring to fig. 1, the rapid training method includes the following steps:

pre-installing a training environment in a training server;

and (3) connecting the file server with a display terminal displaying a control interface, wherein the control interface comprises a training code updating tab, as shown in fig. 2.

The training server may employ a server or a cluster of servers.

The file server may employ a server or a cluster of servers.

Because the training environment is preinstalled in the training server, the environment is not required to be installed and prepared when the deep learning model is trained, and the working efficiency can be greatly improved compared with the original mode (the time for installing and deploying the environment by one ordinary technician is minimum by 7 days). In addition, the training codes are uploaded to the training server through the file server, if the training codes need to be updated, the new training codes are directly uploaded to the file server, the training code updating option card is selected, and the training server can automatically download the latest training code coverage operation in the file server, so that the training codes are updated rapidly. Referring to fig. 2, fig. 2 is an updated training parameter interface of the control interface, and the training code update tab is the "update FTP training code" option in fig. 2.

It should be noted that in order to ensure that the training process is efficiently and smoothly completed, it is required that dependent components or software related to the training environment are already provided in the pre-installed training environment.

Of course, if the pre-installed training environment lacks relevant dependent components or software, the training environment can be customized based on the pre-installed training environment. Specifically, the rapid training method comprises the step of directly customizing through the control interface when relevant dependent components or software are absent in the preloaded training environment. The step of directly customizing the control interface when the relevant dependent components or software are absent in the preloaded training environment comprises the following steps: the training server automatically generates a dockerfile according to the existing dependent component and automatically downloads the related dependent component through the dockerfile; adding the base image to the new dependent components generates a custom running environment for the user. The control interface includes an environment custom interface as shown in fig. 3, where the environment custom interface includes a "select base environment" field, an "existing dependency" field, an "add dependency" field, an "environment name" field, an "environment version" field, and an "environment description field. We can perform the above steps in the environment custom interface.

Further, the control interface further includes a training parameter adjustment tab (see fig. 2, where the training parameter adjustment tab is displayed as a "start parameter"), and the updated training parameter is transmitted to the training server through a message received by the message queue. Preferably, referring to fig. 2, the training parameter adjustment tab includes at least a learning rate adjustment tab and an iteration number tab. In this way, the parameter adjustment in the training process can be completed by operating the training code update tab in the control interface, and each time the parameter is adjusted, the parameter adjustment can be completed in 5 minutes, and compared with the original processing mode to the training server (the minimum time required for each adjustment in the prior art is 1 hour), the working efficiency can be greatly improved.

Further, training data generated in the deep learning model training process are stored in the file server, and the file server displays the training data through the control interface. In this way, the training process can be tracked through the control interface without viewing in the training server.

Further, the log generated in the deep learning model training process is stored in the file server, and the file server displays the log through the control interface. Therefore, the log generated in the training process can be directly displayed in the control interface, and is convenient to read.

In summary, the implementation of the rapid training method for the deep learning model provided by the invention has at least the following beneficial effects:

1. the file server is used for uniformly managing training codes and training data, the file server is connected with the display terminal and the training server, the training server can acquire the training codes from the file server, and the training data can be displayed through the display terminal. In this way, the training codes are conveniently updated and the training data is conveniently tracked.

2. The control interface on the display terminal comprises a training parameter adjustment tab, so that adjustment of the training parameters is not required to be processed by the training server, and the time consumption for parameter adjustment is greatly shortened.

3. The training server presets a training environment and supports a fast custom training environment when the preset training environment lacks relevant dependent components or software.

Example two

The embodiment provides a rapid training system for a deep learning model. The rapid training method of embodiment one may be implemented by the rapid training system.

In this embodiment, as shown in fig. 4, the fast training system includes a training server, a file server connected with the training server by signals, and a display terminal connected with the file server, where the training server is preloaded with a training environment, the file server stores training codes, and the display terminal displays a control interface, and the control interface includes a training code update tab. The control interface also comprises a training parameter adjustment tab, and the updated training parameters are transmitted to the training server through the information received by the information queue; the training parameter adjustment tab includes a learning rate adjustment tab and an iteration number tab. The training server may employ a server or a cluster of servers. The file server may employ a server or a cluster of servers. Because the training environment is preinstalled in the training server, the environment is not required to be installed and prepared when the deep learning model is trained, and the working efficiency can be greatly improved compared with the original mode (the time for installing and deploying the environment by one ordinary technician is minimum by 7 days). In addition, the training codes are uploaded to the training server through the file server, if the training codes need to be updated, the new training codes are directly uploaded to the file server, and the training server can automatically download the latest training codes in the file server for covering and running, so that the updating of the training codes is completed rapidly. It should be noted that in order to ensure that the training process is efficiently and smoothly completed, it is required that dependent components or software related to the training environment are already provided in the pre-installed training environment.

Of course, if the pre-installed training environment lacks relevant dependent components or software, the training environment can be customized based on the pre-installed training environment. Specifically, the rapid training method comprises the step of directly customizing through the control interface when relevant dependent components or software are absent in the preloaded training environment. The step of directly customizing the control interface when the relevant dependent components or software are absent in the preloaded training environment comprises the following steps: the training server automatically generates a dockerfile according to the existing dependent component and automatically downloads the related dependent component through the dockerfile; adding the base image to the new dependent components generates a custom running environment for the user.

In this embodiment, the control interface further includes a training parameter adjustment tab, and the updated training parameter is transmitted to the training server through a message received by the message queue. Preferably, referring to fig. 2, the training parameter adjustment tab includes at least a learning rate adjustment tab and an iteration number tab. In this way, the parameter adjustment in the training process can be completed by operating the training code update tab in the control interface, and each time the parameter is adjusted, the parameter adjustment can be completed in 5 minutes, and compared with the original processing mode to the training server (the minimum time required for each adjustment in the prior art is 1 hour), the working efficiency can be greatly improved.

In this embodiment, training data generated in the training process of the deep learning model is stored in the file server, and the file server displays the training data through the control interface. In this way, the training process can be tracked through the control interface without viewing in the training server.

In this embodiment, the log generated in the training process of the deep learning model is stored in the file server, and the file server displays the log through the control interface. Therefore, the log generated in the training process can be directly displayed in the control interface, and is convenient to read.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims

1. A training method of a deep learning model, the training method comprising the steps of:

pre-installing a training environment in a training server;

the file server is in signal connection with a display terminal displaying a control interface, wherein the control interface comprises a training code updating option card;

training data generated in the deep learning model training process are stored in the file server, and the file server displays the training data through the control interface;

the logs generated in the training process of the deep learning model are stored in the file server, and the file server displays the logs through the control interface;

the control interface also comprises a training parameter adjustment tab, and the updated training parameters are transmitted to the training server through the information received by the information queue;

the training parameter adjustment tab comprises a learning rate adjustment tab and an iteration number tab;

pre-installed dependent components or software have been provided in the training environment;

the training method comprises the step of directly customizing through the control interface when relevant dependent components or software are absent in the preloaded training environment;

the step of directly customizing the control interface when the relevant dependent components or software are absent in the preloaded training environment comprises the following steps:

adding a new dependent component to the basic mirror image to generate a user-defined running environment, and processing a training parameter adjustment tab and a training code update tab in an interface;

and uploading the training codes to the training server through the file server, if the training codes need to be updated, directly uploading the new training codes to the file server, and automatically downloading the latest training codes in the file server by the training server for covering and running, thereby completing the updating of the training codes.

2. A training system of a deep learning model, which is used for the training method of the deep learning model according to claim 1, and is characterized in that the training system comprises a training server, a file server connected with the training server in a signal manner and a display terminal connected with the file server, wherein the training server is preloaded with a training environment, training codes are stored in the file server, and a control interface is displayed on the display terminal and comprises a training code updating tab;

the control interface also comprises a training parameter adjustment tab, and the updated training parameters are transmitted to the training server through the information received by the information queue; the training parameter adjustment tab comprises a learning rate adjustment tab and an iteration number tab;

if the pre-installed training environment lacks relevant dependent components or software, the training environment is customized on the basis of the pre-installed training environment.