CN112400160A

CN112400160A - Method and apparatus for training neural network

Info

Publication number: CN112400160A
Application number: CN201880095511.2A
Authority: CN
Inventors: 张丰伟; 沈灿泉; 邵云峰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2021-02-23
Also published as: WO2020062303A1

Abstract

A method for training a neural network comprises determining the number of training parameters of a neural network training task; determining target training resources from a training resource library according to the number of the training parameters, wherein the training resource library comprises at least one training resource, a corresponding relation exists between the at least one training resource and the number of the at least one parameter, the at least one training resource comprises the target training resources, and the number of the at least one parameter comprises the number of the training parameters of the neural network training task; and executing the neural network training task through the target training resource. According to the method for training the neural network, the data center can determine target training resources from the training resource library according to the neural network training task, and the neural network training task can be completed without providing a basic framework of a resource pool for a user, so that risks caused by exposure of the basic framework of the resource pool are reduced, and the safety of the data center is improved.

Description

Method and apparatus for training neural network

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for training a neural network.

Background

The neural network is a mathematical model capable of obtaining a solution through learning, and has wide application in the fields of image recognition, voice recognition, natural language processing and the like. Usually, the neural network needs to be trained by a large number of training samples to be used, and the number of training samples and the number of model parameters of the neural network are main factors that restrict the training rate of the neural network.

In order to accelerate the training rate of the neural network, a high-performance processor is required to train the neural network, however, the cost of the high-performance processor is high for individual users and small and medium-sized enterprises, which adversely affects the development and application of the neural network.

A method for solving the problems is to deploy a high-performance processor at a cloud end to form a computing resource pool, the computing resource pool provides computing resource leasing service for users, and the users can carry out neural network training without purchasing the high-performance processor, so that the problem that the cost of developing a neural network by individual users and small and medium-sized enterprises is high is solved.

However, the infrastructure (e.g., topology) of the computing resource pool is not typically exposed to the user, and the training efficiency of the neural network is closely related to the infrastructure of the computing resource pool, and in the case of ambiguous infrastructure, it is difficult for the user to make a suitable choice.

Disclosure of Invention

The application provides a method and a device for training a neural network, which can provide neural network training service for a user under the condition of not exposing a computing resource pool to the user.

In a first aspect, a method for training a neural network is provided, including: determining the number of training parameters of a neural network training task; determining target training resources from a training resource library according to the number of the training parameters, wherein the training resource library comprises at least one training resource, a corresponding relation exists between the at least one training resource and the number of the at least one parameter, the at least one training resource comprises the target training resources, and the number of the at least one parameter comprises the number of the training parameters of the neural network training task; and executing the neural network training task through the target training resource.

According to the method for training the neural network, the data center can determine the target training resources from the training resource library according to the neural network training tasks, and the neural network training tasks can be completed without providing the infrastructure of the resource pool for users, so that the risk caused by exposure of the infrastructure of the resource pool is reduced, and the safety of the data center is improved.

In addition, the user does not need to determine which training resources are needed for completing the neural network training task, and only needs to send the requirements to the data center, so that the satisfaction degree of user experience is improved.

Optionally, before determining the target training resource from the training resource library according to the number of training parameters, the method further includes: establishing a training resource library, wherein a target training resource comprises a plurality of computing units and transmission links among the computing units, and the corresponding relation comprises an incidence relation among the following three: a target training resource, at least one parameter quantity, and a parameter update rate for the at least one parameter quantity.

The data center can establish a training resource library in a testing mode, so that the training resource library matched with the actual condition of the data center can be obtained.

Optionally, establishing a training resource library, including: updating a plurality of neural network parameters through the target training resources, wherein the number of the plurality of neural network parameters is any one of the number of the at least one parameter; determining the parameter updating rates of the plurality of neural network parameters according to the updating completion time of the plurality of neural network parameters, wherein the parameter updating rates of the plurality of neural network parameters are inversely proportional to the updating completion time of the plurality of neural network parameters; and storing the parameter updating rates of the plurality of neural network parameters, and the corresponding relation between the number of the plurality of neural network parameters and the target training resource.

The data center can use small batches of data to update different quantities of neural network parameters on different training resources to obtain a plurality of parameter updating rates, and records the incidence relation among the training resources, the parameter quantities and the parameter updating rates, so that a training resource database can be obtained. For a fixed number of neural network parameters, the shorter the updating completion time is, the faster the updating rate is; the longer the update completion time, the slower the update rate.

Optionally, the neural network training task further comprises a training model of the neural network training task and a specified sample iteration number, wherein the sample iteration number is the number of training samples required to be input for updating the parameters once,

determining target training resources from a training resource library according to the number of training parameters, comprising: determining at least one candidate training resource corresponding to the number of the training parameters from a training resource library according to the corresponding relation; testing the trained model on at least one candidate training resource, and determining the parameter generation rate of the at least one candidate training resource; determining the number of the iteration samples of at least one candidate training resource according to the parameter generating rate, wherein the number of the iteration samples of the preferred training resource is the number of the iteration samples of the selected training resource when the parameter generating rate of the candidate training resource is matched with the parameter updating rate; and determining the candidate training resource with the optimal sample iteration number closest to the specified sample iteration number from at least one candidate training resource as the target training resource.

If the user specifies the training model and the number of the iteration samples, the target training resources meeting the requirements of the user can be determined according to the scheme. In some cases, the user knows the characteristics of the training model better than the data center, and the user can specify the number of iteration samples according to the characteristics of the training model, so that the scheme can improve the training efficiency of the neural network. The user may also specify an appropriate number of iteration samples based on the budget.

Optionally, the neural network training task further comprises a training model of the neural network training task,

determining target training resources from a training resource library according to the number of training parameters, comprising: determining at least one candidate training resource corresponding to the number of the training parameters from a training resource library according to the corresponding relation; testing the trained model on at least one candidate training resource, and determining the parameter generation rate of the at least one candidate training resource; determining the number of the iteration samples of at least one candidate training resource according to the parameter generating rate, wherein the number of the iteration samples of the preferred training resource is the number of the iteration samples of the selected training resource when the parameter generating rate of the candidate training resource is matched with the parameter updating rate; and determining the candidate training resource with the maximum preferred sample iteration number as the target training resource from at least one candidate training resource.

If the user does not specify the number of the iteration samples, the target training resources meeting the requirements of the user can be determined according to the scheme.

Optionally, in the target training resource, the number of training samples carried by any one of the plurality of computing units is proportional to the parameter update rate of the any one computing unit.

The scheme can reasonably distribute the number of samples borne by each computing unit of the target training resources.

In a second aspect, the present application provides a device for training a neural network, where the device may implement functions corresponding to the steps in the method according to the first aspect, where the functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more units or modules corresponding to the above functions.

In one possible design, the apparatus includes a processor configured to support the apparatus to perform the corresponding functions in the method according to the first aspect. The apparatus may also include a memory, coupled to the processor, that retains program instructions and data necessary for the apparatus. Optionally, the apparatus further comprises a communication interface for supporting communication between the apparatus and other devices.

In a third aspect, the present application provides a computer program product comprising: computer program code which, when executed by a processor of an apparatus (e.g. a server) for training a neural network, causes the apparatus for training a neural network to perform the method of the first aspect.

In a fourth aspect, the present application provides a computer storage medium storing computer software instructions for an apparatus for training a neural network as described above, comprising a program designed to perform the method of the first aspect.

In a fifth aspect, the present application provides a system for training a neural network, comprising the apparatus of the second aspect, the computer program product of the third aspect, and the computing storage medium of the fourth aspect.

Drawings

FIG. 1 is a schematic view of a ring suitable for use in the present application;

FIG. 2 is a schematic diagram of an initial state in which the individual computing units of the ring execute a ring aggregation algorithm;

FIG. 3 is a schematic diagram of one step of a ring aggregation algorithm;

FIG. 4 is a schematic diagram of another step of the loop aggregation algorithm;

FIG. 5 is a schematic diagram of an end state of each computing unit of the ring executing the ring aggregation algorithm;

FIG. 6 is a schematic diagram of a method of training a neural network provided herein;

FIG. 7 is a schematic diagram of an apparatus for training a neural network provided herein;

FIG. 8 is a schematic diagram of another apparatus for training a neural network provided herein;

FIG. 9 is a schematic diagram of a system for training a neural network provided herein.

Detailed Description

In order to facilitate understanding of the technical solutions of the present application, first, concepts related to the present application are briefly introduced.

In order to improve the training efficiency of the neural network (especially the deep neural network), one method is to use a distributed training algorithm for training, and the flow of the distributed training algorithm is as follows:

1. each computing unit in a cluster formed by a plurality of computing units (also called computing nodes) independently completes the computation of respective small batch (mini-batch) training data to obtain a gradient;

2. all the computing units in the cluster need to aggregate the computed gradients to form aggregated gradients;

3. distributing the aggregated gradient to each computing unit in the cluster;

4. each calculating unit calculates new neural network parameters based on the aggregated gradient and combined with super parameters such as learning rate and the like, wherein the neural network parameters are parameters forming a neural network model and can also be referred to as parameters for short;

5. all the calculation units can start the next round of iterative calculation only after acquiring the new parameters.

For efficient gradient aggregation, a ring aggregation (ring all reduce) algorithm is commonly used in academia and industry, wherein the logical structure of the ring is shown in fig. 1.

In fig. 1, ring 100 includes 5 computing units, and the 5 computing units are located in a system, which is a device or a cluster formed by a plurality of devices. Each computing unit may be one apparatus or device, or multiple computing units may be located in one apparatus or device. The apparatus or device may be any type of electronic device including, but not limited to, a server, a mainframe, a minicomputer, a portable, or a terminal. Each unit may be a computing element in an apparatus or device, such as a chip, a chipset, or a circuit board carrying a chip or chipset.

The computing unit may be a neural-Network Processing Unit (NPU), a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), a Field Programmable Gate Array (FPGA), or other processors. The 5 computing units shown in fig. 1 may be the same type of chip, and may be different types of chips.

Each computing unit has a preceding unit and a following unit, and the position of each computing unit in the ring is determined by the creator of the ring (e.g., user software). For example, the preceding unit of the calculation unit 0 is the calculation unit 4, and the subsequent unit of the calculation unit 0 is the calculation unit 1. Each computing unit is capable of receiving data from its preceding unit and also of sending its own data to its succeeding unit.

Taking the ring 100 shown in fig. 1 as an example, in the preparation stage of the ring aggregation algorithm, a creator (e.g., user software) of the ring 100 sends control information to each computing unit, and performs slicing processing on data, and gradient data computed by each computing unit is equally divided into 5 blocks. For example, the gradient data calculated by the 5 calculation units shown in fig. 1 are a, b, c, d and e, each calculation unit possesses the complete data calculated by itself, and the initial states of the 5 calculation units are shown in fig. 2.

Subsequently, 5 computing units enter a hash aggregation (scatter reduction) stage, each computing unit sends a block of data of itself to a subsequent unit thereof, and performs aggregation processing on the data received from the previous unit and the data stored by itself.

Figure 3 shows one step of the hash aggregation stage. In this step, the computing unit 0 sends the data block (chunk) a0 to the computing unit 1, and the computing unit 1 receives the data block a0 and then performs an aggregation operation on a0 and the self-stored data block a 1. Meanwhile, the computing unit 1 sends the data block b1 to the computing unit 2, and after receiving the data block b1, the computing unit 2 performs an aggregation operation on the data block b2 stored by itself and the data block b 1. The operation of the other computing units is similar.

Fig. 4 shows another step of the hash aggregation phase. In this step, taking the computing unit 0 as an example, the computing unit 0 receives the data b4+ b3+ b2+ b1 from the front-end unit (computing unit 4), and performs the aggregation operation on the data and the data b0 stored by itself, and the obtained aggregation operation result is b0+ b1+ b2+ b3+ b 4. The calculation unit 0 sends the data c0+ c4+ c3+ c2 stored by itself to the subsequent unit (calculation unit 1) while receiving the data b4+ b3+ b2+ b1, so that the subsequent unit performs the gradient aggregation operation.

After the hash aggregation stage is complete, the loop aggregation algorithm proceeds to the next step, the all gather stage. In the full collection phase, the ring 100 sends the final result obtained by each computing unit to other computing units through 4 passes, for example, the final result obtained by the computing unit 0 performing aggregation operation on the data b is b0+ b1+ b2+ b3+ b4, the AI computing node 0 passes the result to the computing unit 1, the computing unit 1 passes the result to the computing unit 2, and so on, and each computing unit obtains the final result of the aggregation operation on the data b through 4 passes. Similarly, for the other 4 data (a, c, d and e), after 4 passes, each computing unit also obtains the final result of the aggregation operation of the respective data, as shown in fig. 5.

As can be seen from the above training algorithm, in the distributed training scheme, there are two factors affecting the training efficiency of the neural network, one is the computing power of each computing unit, for example, the computing unit processes and trains the rate of gradient generation of a fixed number of samples; the transmission capability between the other respective computing units, e.g., the rate at which a gradient is transmitted between the two computing units. For a data center providing a computing resource pool (hereinafter, simply referred to as "resource pool"), the computing power of the computing units and the transmission rate between the computing units belong to training resources.

In the following, the method for training a neural network provided by the present application will be described in detail by taking the ring 100 as an example. It should be noted that the method provided by the present application is not limited to the ring-shaped distributed architecture shown in fig. 1, and the method provided by the present application may be applied to any distributed training architecture, for example, a reduced-tree (reduce-tree).

Fig. 6 is a schematic diagram illustrating a method of training a neural network provided by the present application.

In the method 600 shown in fig. 6, the data center includes 3 modules, which are a training module, an adaptive module, and a repository management module. The 3 modules are only functionally divided modules, and may be independent modules or submodules of the same module. In addition, the 3 modules may be hardware circuits or software programs. The specific form of these 3 modules is not limited in this application.

The data center may provide neural network training services to users by performing the following steps.

S601, establishing a training resource library.

Before providing a neural network training service for a user, a data center first determines a corresponding relationship between training resources and the number of neural network parameters (referred to as "parameter number" for short, where the neural network parameters may be referred to as "parameters" for short), that is, establishes a training resource library. In this application, the training resource library refers to a database containing the above correspondence relationship, which is not limited to the correspondence relationship between the training resources and the number of parameters, for example, the correspondence relationship may further contain the training resources, the number of parameters, and the update rate of the parameters corresponding to the number of parameters.

The above correspondence can be interpreted as follows: and for a group of parameters with fixed quantity, updating the group of parameters by using different training resources in the resource pool, and determining that the training resources have a corresponding relation with the quantity of the group of parameters if the group of parameters is updated.

The data center may determine the correspondence by way of testing (i.e., probing).

For example, the data center may obtain the ring 100 shown in fig. 1 from the resource pool, and for a set of fixed parameters, the data center may perform an update test on the ring 100 to deploy the set of parameters, and obtain a parameter update rate according to an update completion time of the set of parameters. By testing different numbers of parameters on the ring 100, the association relationship between the ring 100 and different numbers of parameters and different update rates of the parameters can be obtained. A group of parameters with fixed quantity is tested on different training resources, and the incidence relation among the number of the group of parameters, different training resources and different parameter updating rates can be obtained.

Optionally, the data center may input different numbers of training samples (including adjusting the number of training samples input by each computing unit), obtain different parameter update rates, and store the preferred parameter update rate in the training resource library. The preferred parameter update rate refers to a parameter update rate when the parameter generation rate of the training resource matches the parameter transmission rate, and the preferred parameter update rate corresponds to a preferred sample iteration number, for example, a fixed number of parameters are tested on the loop 100, and 1000 samples are input at a time to obtain a parameter update rate a; inputting 1500 samples at a time to obtain a parameter updating rate B; inputting 2000 samples at a time to obtain a parameter updating rate C; if B is the largest of the three values of ABC, then B is taken as the update rate of the corresponding parameter of the ring 100, and the number 2000 is the preferred number of iteration samples of the ring 100.

In the present application, the number of sample iterations is the number of training samples that need to be input to update the parameters once.

In the above example, the reason why a is smaller than B may be that the calculation capacity (parameter generation rate) of the ring 100 is smaller than the transmission capacity (parameter transmission rate) because the number of input samples is small; the reason C is less than B may be that the computational power of the ring 100 is greater than the transmission capability due to the excessive number of samples input, and therefore the parameter update rate of the training resource is fastest only if the computational power of the training resource matches (is the same or approximately the same as) the transmission capability.

After the test is finished, the resource library management module records the corresponding relation among the parameter quantity, the training resources and the training rate, so that a training resource library is established.

The correspondence may be in the following form.

The corresponding relation 1: [ (Ring 0: GPU0, GPU1, GPU2, (parameter update rate 11, parameter 11), (parameter update rate 12, parameter 12), (parameter update rate 13, parameter 13) ].

Correspondence relationship 2: [ (Ring 1: GPU1, GPU2, GPU3, (parameter update rate 21, parameter 21), (parameter update rate 22, parameter 22), (parameter update rate 23, parameter 23) ].

Correspondence relationship 3: [ (Ring 2: GPU0, GPU2, GPU3, (parameter update rate 31, parameter 31), (parameter update rate 32, parameter 32), (parameter update rate 33, parameter 33) ].

In the above correspondence, parameters in the same correspondence are different, and parameters that are not in the same correspondence may be the same or different. For example, parameter 11, parameter 12, and parameter 13 are different from each other; the parameters 11, 21 and 31 may be the same or different.

It should be understood that S601 is only an optional implementation of the technical solution of the present application, and in some cases, the data center does not need to execute S601. For example, the manufacturer of the computing unit pre-configures the correspondence in a repository of the data center based on empirical data.

And S602, acquiring a neural network training task.

The data center determines the user's requirements, such as the number of parameters of the neural network to be trained (i.e., the "number of training parameters" in the claims), based on the training task. The user's needs may also include other information.

For example, a user may specify a training model of a neural network, the data center first determines at least one candidate training resource from a training resource library according to the number of training parameters, and then the data center tests the training model on the at least one candidate training resource to obtain a parameter generation rate of the at least one candidate training resource.

The test training model refers to: deploying a user-specified training model on the candidate training resources, inputting a small batch of samples, generating a parameter (e.g., a gradient), and obtaining a parameter generation rate (e.g., a gradient generation rate).

And then, the data center determines the preferred sample iteration number of at least one candidate training resource according to the parameter generation rate, wherein the preferred sample iteration number is the sample iteration number of the selected training resource when the parameter generation rate of the candidate training resource is matched with the parameter updating rate.

Due to the fact that the complexity of different training models is different, the number of sample iterations of different training models is different when the parameter updating rate is the maximum on the same training resource, therefore, the number of the preferred sample iterations of the candidate training resource cannot be pre-stored in the training resource library, and the training model specified by a user needs to be tested to determine the number of the preferred sample iterations of the candidate training resource.

The test procedure was as follows: deploying a training model specified by a user on the candidate training resources, inputting different sample numbers, and inputting the sample number of the training model, namely the preferred sample iteration number of the candidate training resources when the actual parameter generation rate is matched with (equal to or approximately equal to) the parameter updating rate of the candidate training resources stored in the training resource library.

And testing the iteration number of the preferred samples of the plurality of candidate training resources, and determining the candidate training resource with the largest iteration number of the preferred samples as the target training resource from the plurality of candidate training resources.

If the user specifies the sample iteration number, the data center determines the candidate training resource with the optimal sample iteration number closest to the sample iteration number specified by the user from the plurality of candidate training resources as the target training resource.

For example, two candidate training resources currently exist, the number of iterations of the preferred samples of the candidate training resource a is 5, the number of iterations of the preferred samples of the candidate training resource B is 8, and if the number of iterations of the samples specified by the user is 7, the candidate training resource a is determined to be the target training resource; and if the number of sample iterations specified by the user is 6, determining the candidate training resource A as the target training resource.

As an alternative example, the user may specify the training model and model training rate of the neural network in the training task according to the budget, and may specify a faster model training rate when the budget is higher; when the budget is lower, a slower model training rate may be specified. The data center can determine training resources matched with the training rate required by the user as target training resources through a small batch data test.

As another alternative example, the user may also specify a training model and training resources for the neural network.

The data center determines target training resources according to the requirements of the users, so that different users can be met, and the user satisfaction is improved.

The data center determines a training resource (i.e., a target training resource) corresponding to the demand from a training resource library according to the demand. For example, S605 and S606 are performed.

S605, inquiring the resource library according to the requirement to obtain candidate training resources.

S606, determining target training resources from the candidate training resources.

In S605, the adaptation module may send a query message to the repository management module. After obtaining the query message, the rdm module queries one or more training resources corresponding to the user's requirements (e.g., the number of training parameters) from the repository, i.e., obtains at least one candidate training resource. Subsequently, the resource pool management module sends an information list containing at least one candidate training resource to the adaptive module, and the adaptive module determines a target training resource from the information list. For example, the adaptive module may determine the target training resource from the candidate training resources according to the specific requirements of the user and according to the above relevant description.

By executing S605 and S606, the data center can determine target training resources from the training resource library according to the neural network training task, and the neural network training task can be completed without providing a basic framework of a resource pool for a user, so that the risk caused by exposure of the basic framework of the resource pool is reduced, and the safety of the data center is improved.

After the adaptation module determines the target resource, S607 may be performed.

S607, the self-adapting module sends the information of the target training resource to the training module.

The information of the target training resource is, for example, the type and number of computing units, transmission links between the respective computing units, and the preferred sample iteration number of the target training resource.

The training module receives the information of the target training resource and then executes S608.

And S608, executing the training task according to the information of the target training resource.

Alternatively, the training module may adjust the number of samples deployed on each computing unit to obtain a preferred training rate.

Taking the ring 100 as an example, if the parameter generation rate of the computing unit 0 is 5 gradients generated per second and the parameter generation rate of the computing unit 1 is 8 gradients generated per second, fewer samples may be deployed on the computing unit 0 and more samples may be deployed on the computing unit 1.

Examples of the methods of training neural networks provided herein are described in detail above. It is understood that the apparatus for training a neural network includes hardware structures and/or software modules for performing the respective functions in order to implement the above-described functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present application may perform the division of the functional units for the apparatus for training the neural network according to the above method examples, for example, each function may be divided into each functional unit, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the units in the present application is schematic, and is only one division of logic functions, and there may be another division manner in actual implementation.

In the case of an integrated unit, fig. 7 shows a schematic diagram of a possible structure of the apparatus for training a neural network provided in the present application. The apparatus 700 comprises: a processing unit 701. The processing unit 701 is adapted to control the apparatus 700 to perform the steps of the method shown in fig. 6. Processing unit 701 may also be used to perform other processes for the techniques described herein. The apparatus 700 may further comprise an input-output unit 702 for communicating with other devices, e.g. user equipment, a storage unit 703 for storing program codes and data of the apparatus 700.

For example, the processing unit 701 is configured to perform:

determining the number of training parameters of a neural network training task;

determining target training resources from a training resource library according to the number of the training parameters, wherein the training resource library comprises at least one training resource, a corresponding relation exists between the at least one training resource and the number of the at least one parameter, the at least one training resource comprises the target training resources, and the number of the at least one parameter comprises the number of the training parameters of the neural network training task;

and executing the neural network training task through the target training resource.

The processing unit 701 may be a processor or a controller, such as a CPU, a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The input/output unit 702 is, for example, a communication interface, and the storage unit 703 may be a memory.

When the processing unit 701 is a processor, the input/output unit 702 is a communication interface, and the storage unit 703 is a memory, the apparatus for training a neural network according to the present application may be the apparatus shown in fig. 8.

Referring to fig. 8, the apparatus 800 includes: a processor 801, a communication interface 802 (optional), and a memory 803 (optional). The processor 801, the communication interface 802, and the memory 803 may communicate with each other via internal connection paths to transfer control and/or data signals.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatuses and units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Therefore, the device for training the neural network can determine the target training resources from the training resource library according to the neural network training task, and can complete the neural network training task without providing the infrastructure of the resource pool for a user, so that the risk caused by exposure of the infrastructure of the resource pool is reduced, and the safety of the data center is improved.

Referring to fig. 9, the present application also provides a system architecture 200 for training a neural network.

The server 210 is configured with an input/output (I/O) interface 212 to interact with data from an external device (e.g., the client device 230). a "user" may input neural network training tasks to the I/O interface 212 via the client device 230. The server 210 is, for example, a data center.

Server 210 may call data, code, etc. in data storage system 240 or may store data, instructions, etc. in data storage system 250.

The processor 211 may use the method 600 shown in fig. 6 to train the neural network, and the specific processing may be as described in relation to fig. 6.

The training device 220 is used for training the neural network according to the commands of the processor 211, and the training device 220 is, for example, each computing unit shown in fig. 1, wherein the training device 220 is used for processing the neural network training task and may also be considered as a processor of the server 210.

Finally, the I/O interface 212 returns the results of the processing (e.g., the trained neural network) to the client device 240 for presentation to the user.

In the case shown in FIG. 9, the user may manually specify data to be entered into the server 210, for example, operating in an interface provided by the I/O interface 212. Alternatively, the client device 230 may automatically enter data into the I/O interface 212 and obtain the results, and if the client device 230 automatically enters data to obtain authorization from the user, the user may set the corresponding permissions in the client device 230. The user may view the results output by processor 210 at client device 230, which may be presented in a particular form, for example, by displaying the output results on a screen. The client device 230 may also act as a data collection site to store collected data (e.g., training samples) in the data storage system 240.

It should be noted that fig. 9 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and a position relationship between the devices, modules, and the like shown in the diagram does not set any limit to the technical solution of the present application, for example, in fig. 9, the data storage system 240 is an external memory with respect to the server 210, and optionally, the data storage system 240 may also be disposed in the server 210. Similarly, the training apparatus 200 may also be located in the server 210.

In the embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not limit the implementation processes of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc read only memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), etc.

The above-mentioned embodiments, objects, technical solutions and advantages of the present application are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

A method of training a neural network, comprising:

determining the number of training parameters of a neural network training task;

determining a target training resource from a training resource library according to the number of the training parameters, wherein the training resource library comprises at least one training resource, a corresponding relation exists between the at least one training resource and at least one parameter number, the at least one training resource comprises the target training resource, and the at least one parameter number comprises the number of the training parameters of the neural network training task;

and executing the neural network training task through the target training resource.
The method of claim 1, wherein before determining the target training resource from the training resource pool according to the number of training parameters, the method further comprises:

establishing the training resource library, wherein the target training resource comprises a plurality of computing units and transmission links among the computing units, and the corresponding relationship comprises an association relationship among the following three: the target training resource, the at least one parameter quantity, and a parameter update rate for the at least one parameter quantity.
The method of claim 2, wherein the establishing the training repository comprises:

updating a plurality of neural network parameters through the target training resources, wherein the number of the plurality of neural network parameters is any one of the number of the at least one parameter;

determining a parameter update rate of the plurality of neural network parameters according to update completion times of the plurality of neural network parameters, the parameter update rate of the plurality of neural network parameters being inversely proportional to the update completion times of the plurality of neural network parameters;

and storing the corresponding relation between the parameter updating rate of the plurality of neural network parameters, the number of the plurality of neural network parameters and the target training resource.
The method of claim 2 or 3, wherein the neural network training task further comprises a training model of the neural network training task and a specified number of sample iterations required to input training samples for updating parameters once,

the determining the target training resource from the training resource library according to the number of the training parameters includes:

determining at least one candidate training resource corresponding to the number of the training parameters from the training resource library according to the corresponding relation;

testing the training model on the at least one candidate training resource, and determining a parameter generation rate of the at least one candidate training resource;

determining the number of preferred sample iterations of the at least one candidate training resource according to the parameter generation rate, wherein the number of preferred sample iterations is the number of sample iterations of the candidate training resource when the parameter generation rate of the candidate training resource is matched with the parameter updating rate;

determining a candidate training resource with a preferred sample iteration number closest to the specified sample iteration number from the at least one candidate training resource as the target training resource.
The method of claim 2 or 3, wherein the neural network training task further comprises a training model of the neural network training task,

the determining the target training resource from the training resource library according to the number of the training parameters includes:

determining at least one candidate training resource corresponding to the number of the training parameters from the training resource library according to the corresponding relation;

testing the training model on the at least one candidate training resource, and determining a parameter generation rate of the at least one candidate training resource;

determining the number of preferred sample iterations of the at least one candidate training resource according to the parameter generation rate, wherein the number of preferred sample iterations is the number of sample iterations of the candidate training resource when the parameter generation rate of the candidate training resource is matched with the parameter updating rate;

and determining the candidate training resource with the largest preferred sample iteration number as the target training resource from the at least one candidate training resource.
The method according to any one of claims 1 to 5, wherein in the target training resource, the number of training samples carried by any one of a plurality of computing units is proportional to the parameter update rate of the any one computing unit.
An apparatus for training a neural network, comprising a processing unit configured to:

determining the number of training parameters of a neural network training task;

determining a target training resource from a training resource library according to the number of the training parameters, wherein the training resource library comprises at least one training resource, a corresponding relation exists between the at least one training resource and at least one parameter number, the at least one training resource comprises the target training resource, and the at least one parameter number comprises the number of the training parameters of the neural network training task;

and executing the neural network training task through the target training resource.
The apparatus of claim 7, wherein the processing unit is further configured to:

establishing the training resource library, wherein the target training resource comprises a plurality of computing units and transmission links among the computing units, and the corresponding relationship comprises an association relationship among the following three: the target training resource, the at least one parameter quantity, and a parameter update rate for the at least one parameter quantity.
The apparatus according to claim 8, wherein the processing unit is specifically configured to:

updating a plurality of neural network parameters through the target training resources, wherein the number of the plurality of neural network parameters is any one of the number of the at least one parameter;

determining a parameter update rate of the plurality of neural network parameters according to update completion times of the plurality of neural network parameters, the parameter update rate of the plurality of neural network parameters being inversely proportional to the update completion times of the plurality of neural network parameters;

and storing the corresponding relation between the parameter updating rate of the plurality of neural network parameters, the number of the plurality of neural network parameters and the target training resource.
The apparatus of claim 8 or 9, wherein the neural network training task further comprises a training model of the neural network training task and a specified number of sample iterations, the number of sample iterations being the number of training samples required to be input for updating a parameter once,

the processing unit is specifically configured to:

determining at least one candidate training resource corresponding to the number of the training parameters from the training resource library according to the corresponding relation;

testing the training model on the at least one candidate training resource, and determining a parameter generation rate of the at least one candidate training resource;

determining the number of preferred sample iterations of the at least one candidate training resource according to the parameter generation rate, wherein the number of preferred sample iterations is the number of sample iterations of the candidate training resource when the parameter generation rate of the candidate training resource is matched with the parameter updating rate;

determining a candidate training resource with a preferred sample iteration number closest to the specified sample iteration number from the at least one candidate training resource as the target training resource.
The apparatus of claim 8 or 9, wherein the neural network training task further comprises a training model of the neural network training task,

the processing unit is specifically configured to:

determining at least one candidate training resource corresponding to the number of the training parameters from the training resource library according to the corresponding relation;

testing the training model on the at least one candidate training resource, and determining a parameter generation rate of the at least one candidate training resource;

determining the number of preferred sample iterations of the at least one candidate training resource according to the parameter generation rate, wherein the number of preferred sample iterations is the number of sample iterations of the candidate training resource when the parameter generation rate of the candidate training resource is matched with the parameter updating rate;

and determining the candidate training resource with the largest preferred sample iteration number as the target training resource from the at least one candidate training resource.
The apparatus according to any one of claims 7 to 11, wherein in the target training resource, the number of training samples carried by any one of the plurality of computing units is proportional to the parameter update rate of the any one computing unit.
A system for training a neural network, comprising a processor, a plurality of computational units, and a memory, the processor configured to perform the method of any one of claims 1-6 based on instructions stored in the memory, determine the target training resource;

the plurality of computing units are to: and executing the neural network training task through the target training resource.