CN112465048A

CN112465048A - Deep learning model training method, device, equipment and storage medium

Info

Publication number: CN112465048A
Application number: CN202011404833.6A
Authority: CN
Inventors: 赵仁明
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-03-09

Abstract

The invention discloses a deep learning model training method, a deep learning model training device, deep learning model training equipment and a storage medium. The method comprises the following steps: configuring a plurality of training nodes to use respective local sample data to carry out model training, and counting the training times of each training node; acquiring the training times of the node and other nodes after each training, and comparing the training times of the node with the training times of other nodes; in response to the fact that the training times of other nodes are larger than the training times of the node, updating the model parameters of the node by using the model parameters of other nodes, performing model training by using the updated model parameters and local sample data by using the node, and updating the model parameters and the training times; and responding to the condition that the training times of other nodes are less than or equal to the training times of the node, performing model training by using the updated model parameters and local sample data of the previous training of the node, and updating the model parameters and the training times. The method has the advantages of fast model convergence, high training efficiency and high robustness.

Description

Deep learning model training method, device, equipment and storage medium

Technical Field

The invention belongs to the field of information security, and particularly relates to a deep learning model training method, device, equipment and storage medium.

Background

Deep learning has been widely used in the real world, such as unmanned vehicles, receipt recognition, movie recommendation, and the like. Deep learning requires a large amount of data. For neural networks, how many training samples have a great influence on the quality of AI training. To improve the accuracy of the model, a larger number of data samples are typically used for training. However, it is often not easy to collect data samples for training. Especially for some data with strong data privacy, such as medical data and personal finance-related data. For example, for a 3D brain MRI image data sample, a trained neurologist is required to complete the collection and labeling for a period of about a week. A DNN model cannot be trained efficiently by using a small amount of data samples; meanwhile, the data collection is usually limited by ethical and policy regulations and cannot be effectively implemented when the data collection is desired to be gathered together.

FIG. 1 shows a conventional centralized deep learning training approach, in which a central server is provided for collecting and aggregating parameter data generated by training of training base node models. After receiving all the parameters, the central server averages the parameters, distributes the averaged new parameters to all the training nodes, and after receiving the updated parameters, each training node updates the parameters of the local model and starts the next round of training. The traditional mode of the online deep learning training seriously depends on a central server to carry out parameter interaction, so that the pressure of the central server is higher, and if the central server fails, the model training of each node is influenced; in addition, the training speeds of all nodes are different, the central server can distribute subsequent parameters only after all nodes finish training, the model convergence is slow, and the training efficiency is extremely low.

Disclosure of Invention

In view of the above, there is a need to provide a deep learning model training method, apparatus, device and storage medium capable of eliminating communication bottleneck of a central server and enabling training nodes to interact more frequently.

According to a first aspect of the present invention, there is provided a deep learning model training method, the method comprising:

configuring a plurality of training nodes to use respective local sample data to carry out model training, and counting the training times of each training node;

responding to the completion of each training, acquiring the training times of the node and other nodes, and comparing the training times of the node with the training times of other nodes;

in response to the fact that the training times of other nodes are larger than the training times of the node, updating the model parameters of the node by using the model parameters of other nodes, performing model training by using the updated model parameters and local sample data by using the node, and updating the model parameters and the training times;

and responding to the condition that the training times of other nodes are less than or equal to the training times of the node, performing model training by using the updated model parameters and local sample data of the previous training of the node, and updating the model parameters and the training times.

In one embodiment, the method further comprises:

comparing the training times of the node with preset iteration times;

responding to the fact that the training times of the node are smaller than the preset iteration times, returning to the step of obtaining the training times of the node and other nodes and comparing the training times of the node with the training times of other nodes;

and ending the model training of the node in response to the fact that the training time of the node is equal to the preset iteration time.

In one embodiment, the step of obtaining the training times of the node and the other nodes, and comparing the training times of the node with the training times of the other nodes includes:

responding to the completion of the previous training of the node, and sending a training frequency acquisition request to other nodes by the node, wherein the training frequency acquisition request comprises the training frequency of the node;

and other nodes receive the training frequency acquisition request, and respond to the fact that the training frequency of other nodes is greater than the training frequency of the node, and return the training parameters and the training sample number of other nodes to the node.

In one embodiment, the step of updating the model parameters of the node by using the model parameters of other nodes includes:

the node selects a target node from other nodes;

acquiring training parameters and training sample numbers of the target node and total training sample numbers of a plurality of training nodes;

and carrying out weight averaging on the training parameters of the target node to obtain new model parameters, and updating the model of the node by using the new model parameters.

In one embodiment, the step of performing weight averaging on the training parameters of the target node to obtain new model parameters is performed by performing weight averaging using the following formula;

wherein, a_iRepresents the training sample number of the ith training node, a represents the total training sample number of the training nodes, WⁱModel parameters, W, representing the ith training node^SRepresenting the new model parameters after weight averaging.

In one embodiment, the step of the local node selecting a target node from other nodes includes:

the other nodes return the training times to the node;

and in response to the fact that the node receives training parameters and training sample numbers returned by a plurality of other nodes, the node takes some other training node corresponding to the maximum training times as a target node.

In one embodiment, each training node is configured to cryptographically store local sample data.

According to a second aspect of the present invention, there is provided a deep learning model training apparatus, the apparatus comprising:

the configuration module is used for configuring the plurality of training nodes to use respective local sample data to carry out model training and counting the training times of each training node;

the comparison module is used for responding to the completion of each training, acquiring the training times of the node and other nodes and comparing the training times of the node with the training times of other nodes;

the first model updating module is used for updating the model parameters of the node by using the model parameters of other nodes when the training times of other nodes are greater than the training times of the node, carrying out model training by using the updated model parameters and local sample data by the node, and updating the model parameters and the training times;

and the second model updating module is used for performing model training by using the model parameters updated by the previous training of the node and the local sample data and updating the model parameters and the training times when the training times of other nodes are less than or equal to the training times of the node.

According to a third aspect of the present invention, there is also provided a computer apparatus comprising:

at least one processor; and

the memory stores a computer program capable of running on the processor, and the processor executes the deep learning model training method when executing the program.

According to a fourth aspect of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, performs the aforementioned deep learning model training method.

The deep learning model training method comprises configuring multiple training nodes to use respective local sample data to perform model training, counting the training times of each training node, updating the model parameters of the node by using the model parameters of other nodes if the nodes with the training times larger than the training times of the node exist in other nodes, performing subsequent training by using the updated model parameters and the local sample data, updating the model and the local sample data by using the previous training of the node if the nodes with the training times larger than the training times of the node do not exist in other nodes, thereby removing the central node, eliminating the communication bottleneck of the central node, keeping the training data local, effectively protecting the privacy of the data of each training node, and having good tolerance for the training speed difference of each training node, therefore, interaction of each node is more frequent, the model convergence speed of each node is accelerated, and the model training efficiency is improved. In addition, other training nodes can continue to operate when any training node fails, and robustness of model training is improved.

In addition, the invention also provides a deep learning model training device, a computer device and a computer readable storage medium, which can also achieve the technical effects and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a conventional centralized deep learning training method;

FIG. 2 is a schematic flow chart of a deep learning model training method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a plurality of training node interactions provided by the present invention;

FIG. 4 is a schematic structural diagram of a deep learning model training apparatus according to another embodiment of the present invention;

fig. 5 is an internal structural view of a computer device according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In an embodiment, referring to fig. 2, the invention provides a deep learning model training method, including:

s100, configuring a plurality of training nodes to use respective local sample data to perform model training, and counting the training times of each training node; the training centers are all in network accessibility, each training node holds training sample data of the training node, and the training sample data is not transmitted among the training nodes. Preferably, each training node is configured to encrypt and store local sample data.

For example, assume that there are training node C1 and training node 2, and training node C1 is regarded as this node, and C2 is the other training node, and assume that training node C1 has 10 training sample data stored locally at C1 node, and training node C2 has 15 training sample data stored locally at C2 node, but the respective training sample data of training node C1 and training node C2 are not transmitted.

S200, responding to the completion of each training, acquiring the training times of the node and other nodes, and comparing the training times of the node with the training times of other nodes;

the node refers to any one of the training nodes to be trained, and the other nodes refer to the training nodes except the node.

S300, in response to the fact that the training times of other nodes are larger than the training times of the node, updating the model parameters of the node by using the model parameters of other nodes, performing model training by using the updated model parameters and local sample data by using the node, and updating the model parameters and the training times;

and S400, responding to the fact that the training times of other nodes are less than or equal to the training times of the node, performing model training by using the model parameters updated by the previous training of the node and local sample data, and updating the model parameters and the training times.

In another embodiment, in order to ensure the model convergence of each training node, the method further includes:

s410, comparing the training times of the node with preset iteration times;

s420, in response to the fact that the training times of the node are smaller than the preset iteration times, returning to the step of obtaining the training times of the node and other nodes and comparing the training times of the node with the training times of other nodes;

and S430, responding to the fact that the training time of the node is equal to the preset iteration time, and ending the model training of the node.

It should be noted that, in the specific implementation process, the selection of the preset iteration number may be set according to the number of training nodes, the number of samples of each node, the total number of samples of all training nodes, and the accuracy requirement of the user on the model, and the corresponding preset iteration number of each training node may be the same or different, for example, assuming that there are training node C1 and training node C2, when training node C1 is the node, the preset iteration number may be set to 100, and when training node C2 is the node to be trained, the preset iteration number may be set to 200; the preset number of iterations of each training node is not limited in the present invention, and this embodiment is only used for illustration.

In another embodiment, the step S200 specifically includes the following sub-steps:

s210, responding to the fact that the node finishes the previous training, the node sends a training frequency obtaining request to other nodes, wherein the training frequency obtaining request comprises the training frequency of the node;

and S220, other nodes receive the training frequency acquisition request, and in response to the fact that the training frequency of other nodes is greater than the training frequency of the node, the training parameters and the training sample number of other nodes are returned to the node.

In another embodiment, the foregoing step S300 specifically includes the following sub-steps:

s310, the node selects a target node from other nodes;

s320, acquiring training parameters and training sample numbers of the target node and total training sample numbers of a plurality of training nodes;

s330, carrying out weight average on the training parameters of the target node to obtain new model parameters, and updating the model of the node by using the new model parameters.

Preferably, in the step of performing weight averaging on the training parameters of the target node to obtain new model parameters, the following formula is used for performing weight averaging;

wherein, a_iRepresenting the ith training nodeA represents the total training sample number of the plurality of training nodes, WⁱModel parameters, W, representing the ith training node^SRepresenting the new model parameters after weight averaging.

In another embodiment, the foregoing step S310 specifically includes the following sub-steps:

s311, the other nodes return the training times to the node;

s312, in response to the node receiving the training parameters and the training sample numbers returned by the other nodes, the node takes some other training node corresponding to the maximum training times as a target node.

In another embodiment, please refer to fig. 3, in order to facilitate understanding of the technical solution of the present invention, the following method takes four training nodes as an example and has the following steps:

step 1, distributing respective training sample data for training nodes C1, C2, C3 and C4, configuring the four training nodes to be reachable by a phase network, wherein each training node holds the training sample data of the training node, and the training sample data is not transmitted and shared among the training nodes;

step 2, randomly initializing training parameters of the training nodes C1 to C4, and allocating a training frequency label Vi to each training node, where an initialization vector V ═ V1, V2.. V4] is all zero, where V1 represents the training frequency of the training node C1;

step 3, carrying out model training by using local sample data of training nodes C1 to C4 respectively, and increasing the training times of the training nodes C1 to C4 by 1 after each training is finished;

step 4, randomly selecting one node which completes training from the four training nodes to be recorded as i, and storing the training times of the center of i as Vold;

step 5, sending a request from the training node i to all other centers to acquire the training times of other training nodes;

step 6, after receiving the request, other training nodes j compare the training times Vj and Vold of the training node, if Vj is larger than Vold, the training node j sends the training parameters of the model and the training sample number of the training node to the training node i;

and 7, after the training node i receives all the request replies, performing weight averaging on the received model parameters, and updating the model parameters by using the data after the weight averaging.

And 8, the training node i returns to the step by using the updated model and performs model training once.

For example, if the training node C1 is the local node, the training node C1 sends a request for obtaining the training times to the training nodes C2, C3, and C4 after completing the 6 th training, if the training nodes C2, C3, and C4 complete 4 times of training, 3 times of training, and 9 times of training at this time, only the training node C4 sends the model training parameters and the training sample number of the training node C4 to the training node C1, the new model parameters after the weight average is calculated by using the above formula are updated to the training node C1, and then the training node C1 performs one-time model training by using the new model parameters and the local training sample data so as to change the training times of the training node C1 to 7 times; if the training nodes C2, C3, and C4 complete 4 times of training, 3 times of training, and 6 times of training, at this time, the training nodes directly use the model parameters of the training nodes after the 6 th training and the local training sample data to perform the 7 th model training, and the training is finished until the training times of all the training nodes reach the preset iteration times.

In another embodiment, referring to fig. 4, the present invention further provides a deep learning model training apparatus 60, which includes:

the configuration module 61 is configured to configure each of the plurality of training nodes to perform model training using respective local sample data, and count the number of times of training of each training node;

a comparison module 62, configured to, in response to completion of each training, obtain training times of the node and other nodes, and compare the training times of the node with training times of other nodes;

the first model updating module 63 is configured to update the model parameters of the node by using the model parameters of other nodes when the training times of other nodes are greater than the training times of the node, perform model training by using the updated model parameters and local sample data of the node, and update the model parameters and the training times;

and a second model updating module 64, configured to perform model training by using the model parameters and local sample data updated by the previous training of the node when the training times of the other nodes are less than or equal to the training time of the node, and update the model parameters and the training times.

It should be noted that, for specific limitations of the deep learning model training apparatus, reference may be made to the above limitations of the deep learning model training method, and details are not described here again. The modules in the deep learning model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

According to another aspect of the present invention, a computer device is provided, the computer device may be a server, and the internal structure thereof is shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the deep learning model training method described above.

According to a further aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the deep learning model training method described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A deep learning model training method, the method comprising:

2. The method of claim 1, further comprising:

comparing the training times of the node with preset iteration times;

3. The method of claim 1, wherein the step of obtaining the number of times of training of the node and the other nodes, and comparing the number of times of training of the node with the number of times of training of the other nodes comprises:

and other nodes receive the training frequency acquisition request, and respond to the fact that the training frequency of other nodes is greater than the training frequency of the node, and return training parameters and training sample numbers of other nodes to the node.

4. The method of claim 3, wherein the step of updating the model parameters of the node with the model parameters of the other nodes comprises:

the node selects a target node from other nodes;

5. The method according to claim 4, wherein the step of performing weight averaging on the training parameters of the target node to obtain new model parameters is performed by performing weight averaging using the following formula;

6. The method of claim 4, wherein the step of the local node selecting a target node from other nodes comprises:

the other nodes return the training times to the node;

7. The method of any of claims 1-6, wherein each training node is configured to cryptographically store local sample data.

8. An apparatus for deep learning model training, the apparatus comprising:

9. A computer device, comprising:

at least one processor; and

a memory storing a computer program operable in the processor, the processor when executing the program performing the method of any of claims 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.