WO2020062303A1 - 训练神经网络的方法和装置 - Google Patents

训练神经网络的方法和装置 Download PDF

Info

Publication number
WO2020062303A1
WO2020062303A1 PCT/CN2018/109212 CN2018109212W WO2020062303A1 WO 2020062303 A1 WO2020062303 A1 WO 2020062303A1 CN 2018109212 W CN2018109212 W CN 2018109212W WO 2020062303 A1 WO2020062303 A1 WO 2020062303A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
resource
neural network
training resource
parameter
Prior art date
Application number
PCT/CN2018/109212
Other languages
English (en)
French (fr)
Inventor
张丰伟
沈灿泉
邵云峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880095511.2A priority Critical patent/CN112400160A/zh
Priority to PCT/CN2018/109212 priority patent/WO2020062303A1/zh
Publication of WO2020062303A1 publication Critical patent/WO2020062303A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method and a device for training a neural network.
  • Neural network is a mathematical model that can be solved by learning. It has a wide range of applications in image recognition, speech recognition, and natural language processing. Generally, neural networks need to be trained with a large number of training samples before they can be used. The number of training samples and the number of model parameters of the neural network are the main factors that restrict the training rate of the neural network.
  • One method to solve the above problem is to deploy a high-performance processor in the cloud to form a computing resource pool.
  • the computing resource pool provides users with computing resource leasing services. Users do not need to purchase a high-performance processor to perform neural network training. The problem of higher cost of developing neural networks for individual users and SMEs has been solved.
  • the infrastructure of the computing resource pool (for example, the topology) is usually not exposed to the user, and the training efficiency of the neural network is closely related to the infrastructure of the computing resource pool. It is difficult for users to do it if the infrastructure is not clear. Make the right choice.
  • This application provides a method and device for training a neural network, which can provide a user with a neural network training service without exposing the computing resource pool to the user.
  • a method for training a neural network including: determining a number of training parameters for a neural network training task; and determining a target training resource from a training resource library according to the number of training parameters, wherein the training resource database includes at least one training resource There is a corresponding relationship between at least one training resource and at least one number of parameters.
  • At least one training resource includes target training resources, and at least one parameter number includes training parameters of a neural network training task; a neural network training task is performed through the target training resources.
  • a data center can determine a target training resource from a training resource library according to a neural network training task, and can complete a neural network training task without providing a resource pool infrastructure to a user, thereby reducing Risks caused by the exposure of the infrastructure of the resource pool improve the security of the data center.
  • the user does not need to determine which training resources are needed to complete the neural network training task, and only needs to send the requirements to the data center, thereby improving the satisfaction of the user experience.
  • the method further includes: establishing a training resource database, wherein the target training resource includes a plurality of computing units and between the plurality of computing units.
  • the corresponding relationship includes an association relationship among the following three: a target training resource, at least one number of parameters, and a parameter update rate of at least one number of parameters.
  • the data center can establish a training resource library by itself through testing, so that it can obtain a training resource library that matches the actual situation of the data center.
  • establishing a training resource database includes: updating multiple neural network parameters through a target training resource, and the number of the multiple neural network parameters is any one of the at least one parameter mentioned above; and according to the update completion time of the multiple neural network parameters Determine the parameter update rate of multiple neural network parameters.
  • the parameter update rate of multiple neural network parameters is inversely proportional to the update completion time of multiple neural network parameters.
  • the data center can use small batches of data to update different numbers of neural network parameters on different training resources, get multiple parameter update rates, record the correlation between training resources, the number of parameters, and the parameter update rate, so that training resources can be obtained database.
  • the shorter the update completion time the faster the update rate; the longer the update completion time, the slower the update rate.
  • the neural network training task further includes a training model of the neural network training task and a specified number of sample iterations, and the number of sample iterations is a number of training samples input required to update a parameter,
  • the determination of the target training resource from the training resource database according to the number of training parameters includes: determining at least one candidate training resource corresponding to the number of training parameters from the training resource database according to the correspondence relationship; testing the trained model on the at least one candidate training resource to determine The parameter generation rate of at least one candidate training resource; the preferred sample iteration number of at least one candidate training resource is determined according to the parameter generation rate, and the preferred sample iteration number is the sample iteration of the candidate training resource when the parameter generation rate of the candidate training resource matches the parameter update rate The number; from at least one candidate training resource, the candidate training resource whose number of preferred sample iterations is closest to the specified number of sample iterations is determined as the target training resource.
  • the target training resources that meet the user's needs can be determined according to the above scheme.
  • users know the characteristics of the training model better than the data center. Users can specify the number of iteration samples according to the characteristics of the training model. Therefore, the above scheme can improve the training efficiency of neural networks.
  • the user can also specify the appropriate number of iteration samples based on the budget.
  • the neural network training task further includes a training model of the neural network training task,
  • the determination of the target training resource from the training resource database according to the number of training parameters includes: determining at least one candidate training resource corresponding to the number of training parameters from the training resource database according to the correspondence relationship; testing the trained model on the at least one candidate training resource to determine The parameter generation rate of at least one candidate training resource; the preferred sample iteration number of at least one candidate training resource is determined according to the parameter generation rate, and the preferred sample iteration number is the sample iteration of the candidate training resource when the parameter generation rate of the candidate training resource matches the parameter update rate The number; the candidate training resource with the largest number of preferred sample iterations is determined from the at least one candidate training resource as the target training resource.
  • the target training resources that meet the user's needs can be determined according to the above scheme.
  • the number of training samples carried by any one of the plurality of calculation units is directly proportional to a parameter update rate of the any one calculation unit.
  • the above scheme can reasonably allocate the number of samples carried by each computing unit of the target training resource.
  • the present application provides a device for training a neural network, which can implement functions corresponding to each step in the method according to the first aspect, and the functions can be implemented by hardware or can execute corresponding functions by hardware.
  • the hardware or software includes one or more units or modules corresponding to the functions described above.
  • the apparatus includes a processor configured to support the apparatus to perform a corresponding function in the method according to the first aspect.
  • the device may also include a memory for coupling to the processor, which stores program instructions and data necessary for the device.
  • the apparatus further includes a communication interface, which is used to support communication between the apparatus and other devices.
  • the present application provides a computer program product, the computer program product comprising: computer program code, when the computer program code is trained by a processor of a device (eg, a server) trained to train a neural network, so that The apparatus for training a neural network performs the method of the first aspect.
  • the present application provides a computer storage medium for storing computer software instructions for the above-mentioned device for training and training a neural network, which includes a program designed to execute the method of the first aspect.
  • the present application provides a system for training a neural network, including the device of the second aspect, the computer program product of the third aspect, and the computing storage medium of the fourth aspect.
  • FIG. 1 is a schematic diagram of a ring applicable to the present application
  • FIG. 2 is a schematic diagram of an initial state of each ring computing unit executing a ring aggregation algorithm
  • FIG. 3 is a schematic diagram of a step of the ring aggregation algorithm
  • FIG. 4 is a schematic diagram of another step of the ring aggregation algorithm
  • FIG. 5 is a schematic diagram of an end state of each ring computing unit performing a ring aggregation algorithm
  • FIG. 6 is a schematic diagram of a method for training a neural network provided by the present application.
  • FIG. 7 is a schematic diagram of a device for training a neural network provided by the present application.
  • FIG. 8 is a schematic diagram of another apparatus for training a neural network provided by the present application.
  • FIG. 9 is a schematic diagram of a system for training a neural network provided by the present application.
  • one method is to use distributed training algorithms for training.
  • the process of distributed training algorithms is as follows:
  • Each computing unit in a cluster of multiple computing units (also referred to as “computing nodes”) independently completes the calculation of its own mini-batch training data to obtain the gradient;
  • All computing units in the cluster need to aggregate the calculated gradients to form the aggregated gradient
  • Each calculation unit calculates new neural network parameters based on the aggregated gradients, combined with hyper-parameters such as the learning rate, etc.
  • the neural network parameters are the parameters that make up the neural network model, and can also be simply referred to as "parameters";
  • All the calculation units can start the next round of iterative calculations only after obtaining new parameters.
  • a ring aggregation (reduce) algorithm is commonly used in academia and industry.
  • the logical structure of the ring is shown in FIG. 1.
  • the ring 100 includes five computing units, and the five computing units are located in a system that is a cluster of one device or multiple devices.
  • Each computing unit may be one device or device, or multiple computing units may be located in one device or device.
  • the device or equipment may be various types of electronic equipment, including but not limited to servers, mainframes, minicomputers, portable computers, or terminals.
  • Each unit may be a computing element in a device or device, such as a chip, chipset, or a circuit board that carries the chip or chipset.
  • the above computing unit may be a neural-network processing unit (NPU), a graphics processing unit (GPU) or a central processing unit (CPU), or a field programmable Gate array (field programmable array, FPGA) or other processors.
  • NPU neural-network processing unit
  • GPU graphics processing unit
  • CPU central processing unit
  • FPGA field programmable Gate array
  • the five computing units shown in FIG. 1 may be the same type of chip, and may be different types of chips.
  • Each calculation unit has a preorder unit and a postorder unit, and the position of each calculation unit in the ring is determined by the creator of the ring (for example, user software).
  • the pre-order unit of computation unit 0 is computation unit 4
  • the post-order unit of computation unit 0 is computation unit 1.
  • Each computing unit can receive data from the preorder unit of the computing unit, and can also send its own data to the postorder unit of the computing unit.
  • the creator of the ring 100 sends control information to each computing unit, and slices the data.
  • Each computing unit calculates The gradient data is equally divided into 5 blocks.
  • the gradient data calculated by the five calculation units shown in FIG. 1 are a, b, c, d, and e.
  • Each calculation unit has its own complete data calculated.
  • the initial state of the five calculation units is as follows: Shown in Figure 2.
  • each computing unit enters a scatter aggregation phase, and each computing unit sends its own piece of data to its subsequent unit, and performs aggregation processing on the data received from the previous unit and the data it stores.
  • Figure 3 shows one step in the hash aggregation phase.
  • the computing unit 0 sends a chunk a0 to the computing unit 1.
  • the computing unit 1 After receiving the data block a0, the computing unit 1 performs an aggregation operation on a0 and the data block a1 stored by itself.
  • the computing unit 1 sends the data block b1 to the computing unit 2.
  • the computing unit 2 After receiving the data block b1, the computing unit 2 performs an aggregation operation on b1 and the data block b2 stored by itself.
  • the operation of other computing units is similar.
  • Figure 4 shows another step in the hash aggregation phase.
  • the calculation unit 0 is taken as an example.
  • the calculation unit 0 receives data b4 + b3 + b2 + b1 from the preamble unit (calculation unit 4), and performs aggregation operation on the data and the data b0 stored by itself to obtain an aggregation.
  • the result is b0 + b1 + b2 + b3 + b4.
  • the computing unit 0 sends the data c0 + c4 + c3 + c2 stored in itself to the post-order unit (computing unit 1) while receiving the data b4 + b3 + b2 + b1, so that the post-order unit performs the gradient aggregation operation.
  • the ring aggregation algorithm proceeds to the next step, the all gather phase.
  • the ring 100 sends the final results obtained by each computing unit to other computing units through 4 passes.
  • the final result obtained by computing unit 0 performing aggregation on data b is b0 + b1 + b2 + b3 + b4, then AI calculation section 0 passes the result to calculation unit 1, calculation unit 1 passes the result to calculation unit 2, and so on.
  • each calculation unit gets the final result of the aggregation operation of data b. result.
  • each calculation unit also obtains the final result of the aggregation operation of each data, as shown in FIG. 5.
  • the computing power of each computing unit processes the rate at which a fixed number of samples are trained to generate a gradient; Transmission capability between another computing unit, for example, the rate at which gradients are transmitted between two computing units.
  • a data center that provides a computing resource pool (hereinafter simply referred to as a "resource pool")
  • resources for a data center that provides a computing resource pool (hereinafter simply referred to as a "resource pool")
  • both the computing power of the computing units and the transmission rate between the computing units are training resources.
  • the method for training a neural network provided by the present application will be described in detail using the ring 100 as an example. It should be noted that the method provided in this application is not limited to the ring distributed architecture shown in FIG. 1, and the method provided in this application can be applied to any distributed training architecture, for example, a reduce-tree.
  • FIG. 6 shows a schematic diagram of a method for training a neural network provided by the present application.
  • the data center includes three modules, namely a training module, an adaptive module, and a resource library management module.
  • These three modules are only modules that are divided from functions. They can be independent modules or sub-modules of the same module. In addition, these three modules can be hardware circuits or software programs. The specific forms of the three modules in this application are not limited.
  • the data center can provide users with neural network training services by performing the following steps.
  • the data center Before providing a neural network training service for users, the data center must first determine the correspondence between the number of training resources and the parameters of the neural network (referred to as "parameters", where the neural network parameters can be referred to as “parameters”), that is, to establish Training resource library.
  • the training resource database refers to a database containing the above-mentioned correspondence relationship.
  • the above-mentioned correspondence relationship is not limited to the correspondence relationship between the training resources and the number of parameters.
  • the above-mentioned correspondence relationship may also include the training resources, the number of parameters, and the correspondence with the number of parameters. Parameter update rate.
  • the data center can determine the above-mentioned correspondence through testing (ie, detecting).
  • the data center may obtain the ring 100 shown in FIG. 1 from the resource pool.
  • the data center may deploy the group of parameters on the ring 100 to perform an update test.
  • the completion time gives the parameter update rate.
  • the correlation between the ring 100 and different numbers of parameters and different parameter update rates can be obtained.
  • the data center may input different numbers of training samples (including adjusting the number of training samples input by each calculation unit) to obtain different parameter update rates, and save the preferred parameter update rates to the training resource database.
  • the preferred parameter update rate refers to the parameter update rate when the parameter generation rate of the training resources matches the parameter transmission rate.
  • the preferred parameter update rate corresponds to a preferred number of sample iterations.
  • the number 2000 is the preferred number of iteration samples of the ring 100.
  • the number of sample iterations is the number of training samples input required to update the parameters once.
  • the reason that A is less than B may be due to the small number of input samples, and the calculation capacity (parameter generation rate) of Ring 100 is less than the transmission capacity (parameter transmission rate); the reason that C is less than B may be due to the number of input samples Too much, the calculation capacity of the ring 100 is greater than the transmission capacity. Therefore, only when the calculation capacity of the training resources matches the transmission capacity (same or approximately the same), the parameter update rate of the training resources is the fastest.
  • the resource library management module After the test is completed, the resource library management module records the correspondence between the number of parameters, training resources, and training rate, thereby establishing a training resource library.
  • Correspondence relationship 1 [(ring 0: GPU0, GPU1, GPU2; (parameter update rate 11, parameter 11), (parameter update rate 12, parameter 12), (parameter update rate 13, parameter 13)].
  • parameters in the same correspondence relationship are different, and parameters not in the same correspondence relationship may be the same or different.
  • parameter 11, parameter 12, and parameter 13 are different from each other; parameter 11, parameter 21, and parameter 31 may be the same or different.
  • S601 is only an optional implementation of the technical solution of this application.
  • the data center does not need to perform S601.
  • the manufacturer of the computing unit pre-configures the corresponding relationship in the resource library of the data center based on empirical data.
  • the data center determines the user's needs according to the training task, such as the number of parameters of the neural network to be trained (ie, the "number of training parameters" in the claims). User needs can also include other information.
  • a user may specify a training model of a neural network.
  • the data center first determines at least one candidate training resource from a training resource database according to the number of training parameters. Subsequently, the data center tests the training model on the at least one candidate training resource to obtain the at least one candidate training resource. Parameter generation rate of a candidate training resource.
  • the above test training model refers to: deploying a user-specified training model on candidate training resources, inputting a small batch of samples, generating parameters (for example, gradients), and obtaining parameter generation rates (for example, gradient generation rates).
  • the data center determines a preferred number of sample iterations of at least one candidate training resource according to the parameter generation rate, and the preferred sample iteration number is the number of sample iterations of candidate training resources when the parameter generation rate of the candidate training resource matches the parameter update rate.
  • the number of sample iterations of different training models is different when the parameter update rate on the same training resource is the largest. Therefore, the number of preferred sample iterations of candidate training resources cannot be pre-stored in the training resource library. The user-specified training model is tested to determine the optimal sample iteration number of candidate training resources.
  • the test process is as follows: a user-specified training model is deployed on the candidate training resources, and different sample numbers are input.
  • the actual parameter generation rate matches the parameter update rate of the candidate training resource stored in the training resource database (equal or approximately equal)
  • the number of samples of the input training model is the optimal sample iteration number of candidate training resources.
  • the number of preferred sample iterations of multiple candidate training resources is tested, and the candidate training resource with the largest number of preferred sample iterations is determined from the multiple candidate training resources as the target training resource.
  • the data center determines, from a plurality of candidate training resources, candidate training resources whose number of preferred sample iterations is closest to the number of sample iterations specified by the user as the target training resources.
  • the number of preferred sample iterations of candidate training resource A is 5
  • the number of iterative samples of candidate training resource B is 8, and if the number of sample iterations specified by the user is 7, then candidate training resource A is determined.
  • the target training resource if the number of sample iterations specified by the user is 6, the candidate training resource A is determined as the target training resource.
  • the user can specify the training model and model training rate of the neural network in the training task according to the budget.
  • the faster model training rate can be specified.
  • the user can specify a faster training rate.
  • Slow model training rate The data center can determine the training resources that match the training rate required by the user as the target training resources through small batch data testing.
  • the user can also specify a training model and training resources for the neural network.
  • the data center determines target training resources according to user needs, so that it can satisfy different users and improve user satisfaction.
  • the data center determines a training resource (that is, a target training resource) corresponding to the demand from the training resource database according to the above requirements. For example, execute S605 and S606.
  • S605 Query a resource database according to requirements to obtain candidate training resources.
  • the adaptive module may send a query message to the resource library management module.
  • the resource library management module obtains the query message, it queries one or more training resources corresponding to the user's needs (for example, the number of training parameters) from the resource library, that is, obtains at least one candidate training resource.
  • the resource library management module sends an information list including at least one candidate training resource to the adaptation module, and the adaptation module determines the target training resource from the information list.
  • the adaptive module can determine the target training resource from the candidate training resources according to the specific description of the user according to the relevant description above.
  • the data center can determine the target training resources from the training resource library according to the neural network training task, and the neural network training task can be completed without providing the user with the infrastructure of the resource pool, thereby reducing the infrastructure of the resource pool Exposure risks increase data center security.
  • the user does not need to determine which training resources are needed to complete the neural network training task, and only needs to send the requirements to the data center, thereby improving the satisfaction of the user experience.
  • S607 may be executed.
  • the adaptive module sends information about the target training resource to the training module.
  • the information of the target training resource is, for example, the type and number of the computing units, the transmission link between the computing units, and the preferred sample iteration number of the target training resources.
  • the training module executes S608.
  • the training module may adjust the number of samples deployed on each computing unit to obtain a preferred training rate.
  • a device for training a neural network includes a hardware structure and / or a software module corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
  • This application can divide the functional unit of a device for training a neural network according to the above method example.
  • each function can be divided into various functional units, or two or more functions can be integrated into one processing unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit. It should be noted that the division of the units in this application is schematic, and it is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 7 shows a possible structure diagram of a device for training a neural network provided by the present application.
  • the apparatus 700 includes a processing unit 701.
  • the processing unit 701 is configured to control the apparatus 700 to execute the steps of the method shown in FIG. 6.
  • the processing unit 701 may also be used to perform other processes of the techniques described herein.
  • the apparatus 700 may further include an input-output unit 702 for communicating with other devices (for example, user equipment), and a storage unit 703 for storing program code and data of the apparatus 700.
  • processing unit 701 is configured to execute:
  • the target training resource is determined from the training resource database according to the number of training parameters, where the training resource database includes at least one training resource, and there is a corresponding relationship between the at least one training resource and the at least one number of parameters.
  • One parameter number includes the training parameter number of the neural network training task;
  • the processing unit 701 may be a processor or a controller, for example, it may be a CPU, a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), and a field programmable gate array. (field, programmable array, FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the input / output unit 702 is, for example, a communication interface, and the storage unit 703 may be a memory.
  • the apparatus for training a neural network involved in this application may be the apparatus shown in FIG. 8.
  • the device 800 includes: a processor 801, a communication interface 802 (optional), and a memory 803 (optional).
  • the processor 801, the communication interface 802, and the memory 803 can communicate with each other through an internal connection path, and transfer control and / or data signals.
  • the apparatus for training a neural network provided in this application can determine a target training resource from a training resource library according to the training task of the neural network, and can complete the training task of the neural network without providing the user with the infrastructure of the resource pool, thereby reducing resources
  • the exposure of the pool's infrastructure increases the security of the data center.
  • the user does not need to determine which training resources are needed to complete the neural network training task, and only needs to send the requirements to the data center, thereby improving the satisfaction of the user experience.
  • the present application further provides a system architecture 200 for training a neural network.
  • the server 210 is configured with an input / output (I / O) interface 212 to perform data interaction with an external device (for example, the client device 230).
  • I / O input / output
  • a "user" can input a neural network to the I / O interface 212 through the client device 230 Training tasks.
  • the server 210 is, for example, a data center.
  • the server 210 may call data, codes, and the like in the data storage system 240, and may also store data, instructions, and the like in the data storage system 250.
  • the processor 211 may use the method 600 shown in FIG. 6 to train the neural network. For specific processing, refer to the related description in FIG. 6.
  • the training device 220 is configured to train a neural network according to a command of the processor 211.
  • the training device 220 is, for example, each computing unit shown in FIG. 1, where the training device 220 is used to process a neural network training task, and may also be considered as the processor.
  • the I / O interface 212 returns the processing result (for example, the trained neural network) to the client device 240 and provides it to the user.
  • the processing result for example, the trained neural network
  • the user can manually specify the data entered in the server 210, for example, operating in an interface provided by the I / O interface 212.
  • the client device 230 may automatically input data to the I / O interface 212 and obtain a result. If the client device 230 automatically inputs data and needs to obtain the authorization of the user, the user may set corresponding permissions in the client device 230.
  • the user may view the result output by the processor 210 on the client device 230, and the specific presentation form may be, for example, displaying the output result on a screen.
  • the client device 230 can also be used as a data collection terminal to store the collected data (for example, training samples) into the data storage system 240.
  • FIG. 9 is only a schematic diagram of a system architecture provided by an embodiment of the present invention.
  • the positional relationship among the devices, components, modules, etc. shown in the figure does not constitute any limitation to the technical solution of this application.
  • the data storage system 240 is an external storage with respect to the server 210.
  • the data storage system 240 may also be placed in the server 210.
  • the training device 200 may also be placed in the server 210.
  • the size of the sequence number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of this application.
  • the steps of the method or algorithm described in combination with the disclosure of this application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), and erasable programmable read-only memory (erasable (programmable ROM, EPROM), electrically erasable programmable read-only memory (EPROM), registers, hard disks, mobile hard disks, read-only optical disks (CD-ROMs), or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may reside in an ASIC.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions may be transmitted from a website site, computer, server, or data center through wired (for example, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (for example, infrared, wireless, microwave, etc.) Another website site, computer, server, or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD), or a semiconductor medium (for example, a solid state disk (SSD)) Wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种训练神经网络的方法,包括:确定神经网络训练任务的训练参数数量;根据训练参数数量从训练资源库中确定目标训练资源,其中,训练资源库包括至少一个训练资源,至少一个训练资源与至少一个参数数量之间存在对应关系,至少一个训练资源包括目标训练资源,至少一个参数数量包括神经网络训练任务的训练参数数量;通过目标训练资源执行神经网络训练任务。按照该训练神经网络的方法,数据中心根据神经网络训练任务可以从训练资源库中确定目标训练资源,无需向用户提供资源池的基础架构即可完成神经网络训练任务,从而减小了资源池的基拙架构暴露导致的风险,提高了数据中心的安全性。

Description

训练神经网络的方法和装置 技术领域
本申请涉及人工智能领域,尤其涉及一种训练神经网络的方法和装置。
背景技术
神经网络是一种能够通过学习得到解决方案的数学模型,其在图像识别、语音识别和自然语言处理等领域有广泛的应用。通常情况下,神经网络需要经过大量的训练样本的训练才能够被使用,训练样本的数量以及神经网络的模型参数的数量是制约神经网络训练速率的主要因素。
为了加快神经网络的训练速率,需要使用高性能处理器去训练神经网络,然而,对于个人用户以及中小企业来说,高性能处理器的成本较高,这对神经网络的开发应用造成不利影响。
一种解决上述问题的方法是将高性能处理器部署在云端,形成计算资源池,该计算资源池为用户提供计算资源租赁服务,用户无需购置高性能处理器即可进行神经网络训练,从而解决了个人用户以及中小企业开发神经网络成本较高的问题。
然而,计算资源池的基础架构(例如,拓扑结构)通常不会暴露给用户,而神经网络的训练效率与计算资源池的基础架构关联密切,在基础架构不明确的情况下,用户很难做出合适的选择。
发明内容
本申请提供了一种训练神经网络的方法和装置,能够在不向用户暴露计算资源池的情况下向用户提供神经网络训练服务。
第一方面,提供了一种训练神经网络的方法,包括:确定神经网络训练任务的训练参数数量;根据训练参数数量从训练资源库中确定目标训练资源,其中,训练资源库包括至少一个训练资源,至少一个训练资源与至少一个参数数量之间存在对应关系,至少一个训练资源包括目标训练资源,至少一个参数数量包括神经网络训练任务的训练参数数量;通过目标训练资源执行神经网络训练任务。
按照本申请提供的训练神经网络的方法,数据中心根据神经网络训练任务可以从训练资源库中确定目标训练资源,无需向用户提供资源池的基础架构即可完成神经网络训练任务,从而减小了资源池的基础架构暴露导致的风险,提高了数据中心的安全性。
此外,用户也无需确定完成神经网络训练任务需要哪些训练资源,只需将需求发送至数据中心即可,从而提高了用户体验的满意度。
可选地,根据训练参数数量从训练资源库中确定目标训练资源之前,所述方法还包括:建立训练资源库,其中,目标训练资源包括多个计算单元和所述多个计算单元之间的传输链路,所述对应关系包括以下三者之间的关联关系:目标训练资源、至少一个参数数量以 及至少一个参数数量的参数更新速率。
数据中心可以自行通过测试的方式建立训练资源库,从而可以获得与数据中心的实际情况相匹配的训练资源库。
可选地,建立训练资源库,包括:通过目标训练资源更新多个神经网络参数,多个神经网络参数的数量为上述至少一个参数数量中的任意一个;根据多个神经网络参数的更新完成时间确定多个神经网络参数的参数更新速率,多个神经网络参数的参数更新速率与多个神经网络参数的更新完成时间成反比;保存多个神经网络参数的参数更新速率、多个神经网络参数的数量与目标训练资源的对应关系。
数据中心可以使用小批量的数据在不同的训练资源上更新不同数量的神经网络参数,得到多个参数更新速率,记录训练资源、参数数量和参数更新速率之间的关联关系,从而可以获得训练资源数据库。其中,对于固定数量的神经网络参数,更新完成时间越短,更新速率越快;更新完成时间越长,更新速率越慢。
可选地,神经网络训练任务还包括神经网络训练任务的训练模型和指定的样本迭代数量,样本迭代数量为更新一次参数所需输入的训练样本的数量,
根据训练参数数量从训练资源库中确定目标训练资源,包括:根据对应关系从训练资源库中确定与训练参数数量对应的至少一个候选训练资源;在至少一个候选训练资源上测试所训练模型,确定至少一个候选训练资源的参数生成速率;根据参数生成速率确定至少一个候选训练资源的优选样本迭代数量,优选样本迭代数量为候选训练资源的参数生成速率与参数更新速率匹配时候选训练资源的样本迭代数量;从至少一个候选训练资源中确定优选样本迭代数量与指定的样本迭代数量最接近的候选训练资源为目标训练资源。
若用户指定了训练模型和迭代样本数量,可以按照上述方案确定满足用户需求的目标训练资源。在一些情况下,用户比数据中心更加了解训练模型的特点,用户可以根据训练模型的特点指定迭代样本数量,因此,上述方案能够提高神经网络的训练效率。用户也可以根据预算情况指定合适的迭代样本数量。
可选地,神经网络训练任务还包括神经网络训练任务的训练模型,
根据训练参数数量从训练资源库中确定目标训练资源,包括:根据对应关系从训练资源库中确定与训练参数数量对应的至少一个候选训练资源;在至少一个候选训练资源上测试所训练模型,确定至少一个候选训练资源的参数生成速率;根据参数生成速率确定至少一个候选训练资源的优选样本迭代数量,优选样本迭代数量为候选训练资源的参数生成速率与参数更新速率匹配时候选训练资源的样本迭代数量;从至少一个候选训练资源中确定优选样本迭代数量最大的候选训练资源为所述目标训练资源。
若用户未指定迭代样本数量,可以按照上述方案确定满足用户需求的目标训练资源。
可选地,在目标训练资源中,多个计算单元中任意一个计算单元承载的训练样本的数量与所述任意一个计算单元的参数更新速率成正比。
上述方案可以合理分配目标训练资源各个计算单元承载的样本数量。
第二方面,本申请提供了一种训练神经网络的装置,该装置可以实现上述第一方面所涉及的方法中各个步骤所对应的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元或模块。
在一种可能的设计中,该装置包括处理器,该处理器被配置为支持该装置执行上述第 一方面所涉及的方法中相应的功能。该装置还可以包括存储器,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。可选地,该装置还包括通信接口,该通信接口用于支持该装置与其它设备之间的通信。
第三方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被训练训练神经网络的装置(例如,服务器)的处理器运行时,使得训练训练神经网络的装置执行第一方面的方法。
第四方面,本申请提供了一种计算机存储介质,用于储存为上述训练训练神经网络的装置所用的计算机软件指令,其包含用于执行第一方面的方法所设计的程序。
第五方面,本申请提供了一种训练神经网络的系统,包括第二方面的装置、第三方面的计算机程序产品以及第四方面的计算存储介质。
附图说明
图1是适用于本申请的一种环的示意图;
图2是环的各个计算单元执行环形聚合算法的初始状态的示意图;
图3是环形聚合算法的一个步骤的示意图;
图4是环形聚合算法的另一个步骤的示意图;
图5是环的各个计算单元执行环形聚合算法的结束状态的示意图;
图6是本申请提供的训练神经网络的方法的示意图;
图7是本申请提供的一种训练神经网络的装置的示意图;
图8是本申请提供的另一种训练神经网络的装置的示意图;
图9是本申请提供的一种训练神经网络的系统的示意图。
具体实施方式
为了便于理解本申请的技术方案,首先对本申请涉及的概念做简要介绍。
为了提高神经网络(特别是深度神经网络)的训练效率,一种方法是使用分布式训练算法进行训练,分布式训练算法的流程如下所示:
1、多个计算单元(也可称为“计算节点”)组成的集群中每个计算单元独立完成各自的小批量(mini-batch)训练数据的计算,获得梯度;
2、集群中所有的计算单元需要将计算获得的梯度进行聚合,形成聚合后的梯度;
3、将聚合后的梯度分发到集群中每个计算单元;
4、每个计算单元基于聚合后的梯度,再结合学习速率等超参数,计算出新的神经网络参数,其中,神经网络参数是组成神经网络模型的参数,也可简称为“参数”;
5、所有的计算单元只有在获取到新的参数之后,才能启动下一轮的迭代计算。
为了高效地进行梯度聚合,目前学术界、工业界常用的是环形聚合(ring all reduce)算法,其中,环的逻辑结构如图1所示。
图1中,环100包括5个计算单元,该5个计算单元位于一个系统内,该系统是一个设备或多个设备形成的集群。每个计算单元可以是一个装置或设备,或者,多个计算单元位于一个装置或设备中。所述装置或设备可以是各类电子设备,包括但不限于服务器、大型机、小型机、便携机或终端。每个单元可以是装置或设备中的一个计算元件,例如芯片、 芯片组或承载了芯片或芯片组的电路板。
上述计算单元可以是神经网络处理器(neural-network processing unit,NPU),也可以是图形处理器(graphics processing unit,GPU)或者中央处理器(central processing unit,CPU),还可以是现场可编程门阵列(field programmable gate array,FPGA)或者其它处理器。图1所示的5个计算单元可以是相同类型的芯片,可以是不同类型的芯片。
每个计算单元均具有一个前序单元和一个后序单元,每个计算单元在环中的位置由环的创建者(例如,用户软件)确定。例如,计算单元0的前序单元是计算单元4,计算单元0的后序单元是计算单元1。每个计算单元均能够从该计算单元的前序单元接收数据,还能够将自身的数据发送至该计算单元的后序单元。
以图1所示的环100为例,在环形聚合算法的准备阶段,环100的创建者(例如,用户软件)向各个计算单元发送控制信息,对数据进行切片处理,每个计算单元计算出的梯度数据被均等地划分成5块。例如,图1所示的5个计算单元计算得到的梯度数据分别为a、b、c、d和e,每个计算单元都拥有自己计算所得的完整数据,该5个计算单元的初始状态如图2所示。
随后,5个计算单元进入散列聚合(scatter reduce)阶段,每个计算单元将自己的一块数据发送给其后序单元,并将从前序单元接收到的数据和自己存储的数据进行聚合处理。
图3示出了散列聚合阶段的一个步骤。在该步骤中,计算单元0将数据块(chunk)a0发送到计算单元1,计算单元1收到数据块a0后,对a0和自己存储的数据块a1进行聚合运算。与此同时,计算单元1将数据块b1发送到计算单元2,计算单元2收到数据块b1后,对b1和自己存储的数据块b2进行聚合运算。其它的计算单元的操作与此类似。
图4示出了散列聚合阶段的另一个步骤。在该步骤中,以计算单元0为例,计算单元0从前序单元(计算单元4)接收数据b4+b3+b2+b1,并将该数据与自身存储的数据b0进行聚合运算,得到的聚合运算结果为b0+b1+b2+b3+b4。计算单元0在接收数据b4+b3+b2+b1的同时将自身存储的数据c0+c4+c3+c2发送至后序单元(计算单元1),以便于后序单元进行梯度聚合运算。
散列聚合阶段完成后,环形聚合算法进行到下一步,即,全收集(all gather)阶段。在全收集阶段,环100通过4次传递,将各个计算单元得到的最终结果发送至其它计算单元,例如,计算单元0对数据b进行聚合运算得到的最终结果为b0+b1+b2+b3+b4,则AI计算节0将该结果传递给计算单元1,计算单元1将该结果传递给计算单元2,依次类推,经过4次传递,每个计算单元均得到了数据b的聚合运算的最终结果。类似地,对于其它4个数据(a、c、d和e),经过4次传递后,每个计算单元也都获得到了各个数据的聚合运算的最终结果,如图5所示。
从上述训练算法可以看出,在分布式训练方案中,影响神经网络训练效率的因素有两点,一个是各个计算单元的计算能力,例如,计算单元处理训练固定数量的样本生成梯度的速率;另一个各个计算单元之间的传输能力,例如,两个计算单元之间传输梯度的速率。对于提供计算资源池(以下,简称为“资源池”)的数据中心来说,计算单元的计算能力以及计算单元之间的传输速率都属于训练资源。
下面,将以环100为例详细描述本申请提供的训练神经网络的方法。需要说明的是, 本申请提供的方法并不限于图1所示的环形分布式架构,本申请提供的方法可以适用于任何分布式训练架构,例如,规约树(reduce-tree)。
图6示出了本申请提供的一种训练神经网络的方法的示意图。
图6所示的方法600中,数据中心包含3个模块,分别为训练模块、自适应模块和资源库管理模块。这3个模块仅是从功能上进行划分的模块,其可以是独立的模块,也可以是同一个模块的子模块。此外,这3个模块可以是硬件电路,也可以是软件程序。本申请这3个模块的具体形式不作限定。
数据中心可以通过执行下述步骤为用户提供神经网络训练服务。
S601,建立训练资源库。
数据中心在为用户提供神经网络训练服务之前,首先要确定训练资源与神经网络参数的数量(简称为“参数数量”,其中,神经网络参数可以简称为“参数”)的对应关系,即,建立训练资源库。本申请中,训练资源库指的是包含上述对应关系的数据库,上述对应关系不限于训练资源与参数数量的对应关系,例如,上述对应关系还可以包含训练资源、参数数量和与该参数数量对应的参数更新速率。
上述对应关系可以解释为如下含义:对于一组数量固定的参数,使用资源池中的不同的训练资源更新该组参数,若该组参数更新完成,则确定该训练资源与该组参数的数量存在对应关系。
数据中心可以通过测试(即,探测)的方式确定上述对应关系。
例如,数据中心可以从资源池中获取如图1所示的环100,对于一组数量固定的参数,数据中心可以在将该组参数部署在环100上进行更新测试,根据该组参数的更新完成时间得到参数更新速率。在环100上测试不同数量的参数,可以得到环100与不同的参数数量和不同的参数更新速率的关联关系。在不同的训练资源上测试一组数量固定的参数,可以得到该组参数的数量、不同的训练资源和不同的参数更新速率的关联关系。
可选地,数据中心可以输入不同数量的训练样本(包括调整各个计算单元输入的训练样本的数量),得到不同的参数更新速率,将优选的参数更新速率保存到训练资源库中。其中,优选的参数更新速率指的是训练资源的参数生成速率与参数传输速率匹配时的参数更新速率,优选的参数更新速率对应一个优选的样本迭代数量,例如,在环100上测试固定数量的参数,一次输入1000个样本,得到参数更新速率A;一次输入1500个样本,得到参数更新速率B;一次输入2000个样本,得到参数更新速率C;若ABC三个数值中B最大,则将B作为环100对应的参数更新速率,数字2000即环100的优选迭代样本数量。
在本申请中,样本迭代数量为更新一次参数所需输入的训练样本的数量。
上述示例中,A小于B的原因可能是由于输入的样本数量较少,环100的计算能力(参数生成速率)小于传输能力(参数传输速率);C小于B的原因可能是由于输入的样本数量过多,环100的计算能力大于传输能力,因此,只有当训练资源的计算能力与传输能力匹配(相同或者近似相同)时,训练资源的参数更新速率才是最快的。
测试完成后,资源库管理模块记录该参数数量、训练资源以及训练速率之间的对应关系,从而建立了训练资源库。
对应关系可是下列形式。
对应关系1:[(环0:GPU0,GPU1,GPU2;(参数更新速率11,参数11),(参数更新速率 12,参数12),(参数更新速率13,参数13)]。
对应关系2:[(环1:GPU1,GPU2,GPU3;(参数更新速率21,参数21),(参数更新速率22,参数22),(参数更新速率23,参数23)]。
对应关系3:[(环2:GPU0,GPU2,GPU3;(参数更新速率31,参数31),(参数更新速率32,参数32),(参数更新速率33,参数33)]。
上述对应关系中,在同一个对应关系中的参数不同,不在同一个对应关系中的参数可以相同,也可以不同。例如,参数11、参数12和参数13彼此相异;参数11、参数21和参数31可以相同,也可以相异。
应理解,S601仅是本申请的技术方案的一个可选的实施方式,在一些情况下,数据中心无需执行S601。例如,计算单元的制造商根据经验数据将对应关系预先配置在数据中心的资源库中。
S602,获取神经网络训练任务。
数据中心根据该训练任务确定用户的需求,该需求例如是待训练的神经网络的参数数量(即,权利要求书中的“训练参数数量”)。用户的需求还可以包括其它信息。
例如,用户可以指定神经网络的训练模型,数据中心首先根据训练参数数量从训练资源库中确定至少一个候选训练资源,随后,数据中心在该至少一个候选训练资源上测试上述训练模型,得到该至少一个候选训练资源的参数生成速率。
上述测试训练模型指的是:在候选训练资源上部署用户指定的训练模型,输入小批量样本,生成参数(例如,梯度),得到参数生成速率(例如,梯度生成速率)。
随后,数据中心根据参数生成速率确定至少一个候选训练资源的优选样本迭代数量,优选样本迭代数量为候选训练资源的参数生成速率与参数更新速率匹配时候选训练资源的样本迭代数量。
由于不同的训练模型的复杂度不同,导致相同的训练资源上参数更新速率最大时不同的训练模型的样本迭代数量不同,因此,无法在训练资源库中预存候选训练资源的优选样本迭代数量,需要对用户指定的训练模型进行测试,确定候选训练资源的优选样本迭代数量。
测试过程如下:在候选训练资源上部署用户指定的训练模型,并输入不同的样本数量,当实际的参数生成速率与候选训练资源在训练资源库中保存的参数更新速率匹配(相等或者近似相等)时,输入训练模型的样本数量即候选训练资源的优选样本迭代数量。
测试多个候选训练资源的优选样本迭代数量,从多个候选训练资源中确定优选样本迭代数量最大的候选训练资源为目标训练资源。
若用户指定了样本迭代数量,则数据中心从多个候选训练资源中确定优选样本迭代数量与用户指定的样本迭代数量最接近的候选训练资源为目标训练资源。
例如,当前存在两个候选训练资源,候选训练资源A的优选样本迭代数量为5,候选训练资源B的优选样本迭代数量为8,若用户指定的样本迭代数量为7,则确定候选训练资源A为目标训练资源;若用户指定的样本迭代数量为6,则确定候选训练资源A为目标训练资源。
作为一个可选的示例,用户可以根据预算在训练任务中指定神经网络的训练模型和模型训练速率,当预算较高时,可以指定较快的模型训练速率;当预算较低时,可以指定较 慢的模型训练速率。数据中心可以通过小批量数据测试确定与用户所需的训练速率匹配的训练资源为目标训练资源。
作为另一个可选的示例,用户还可以指定神经网络的训练模型和训练资源。
数据中心根据用户的需求确定目标训练资源,从而可以满足不同用户,提高用户满意度。
数据中心根据上述需求从训练资源库中确定与该需求对应的训练资源(即,目标训练资源)。例如,执行S605和S606。
S605,根据需求查询资源库,获取候选训练资源。
S606,从候选训练资源中确定目标训练资源。
在S605中,自适应模块可以向资源库管理模块发送查询消息。资源库管理模块获取该查询消息后,从资源库中查询与用户的需求(例如,训练参数数量)存在对应关系的一个或多个训练资源,即,获取至少一个候选训练资源。随后,资源库管理模块将包含至少一个候选训练资源的信息列表发送给自适应模块,自适应模块再从信息列表中确定目标训练资源。例如,自适应模块可以根据用户的具体需求按照上文中的相关描述从候选训练资源中确定目标训练资源。
通过执行S605和S606,数据中心根据神经网络训练任务可以从训练资源库中确定目标训练资源,无需向用户提供资源池的基础架构即可完成神经网络训练任务,从而减小了资源池的基础架构暴露导致的风险,提高了数据中心的安全性。
此外,用户也无需确定完成神经网络训练任务需要哪些训练资源,只需将需求发送至数据中心即可,从而提高了用户体验的满意度。
自适应模块确定了目标资源之后,可以执行S607。
S607,自适应模块向训练模块发送目标训练资源的信息。
目标训练资源的信息例如是计算单元的类型和数量、各个计算单元之间的传输链路以及目标训练资源的优选样本迭代数量。
训练模块接收到目标训练资源的信息后执行S608。
S608,根据目标训练资源的信息执行训练任务。
可选地,训练模块可以调整各个计算单元上部署的样本的数量以获取优选的训练速率。
以环100为例,若计算单元0的参数生成速率为每秒生成5个梯度,计算单元1的参数生成速率为每秒生成8个梯度,则可以在计算单元0上部署较少的样本,在计算单元1上部署较多的样本。
上文详细介绍了本申请提供的训练神经网络的方法的示例。可以理解的是,训练神经网络的装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请可以根据上述方法示例对训练神经网络的装置进行功能单元的划分,例如,可 以将各个功能划分为各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用集成的单元的情况下,图7示出了本申请提供的训练神经网络的装置的一种可能的结构示意图。装置700包括:处理单元701。处理单元701用于控制装置700执行图6所示的方法的步骤。处理单元701还可以用于执行本文所描述的技术的其它过程。装置700还可以包括输入输出单元702,用于与其它设备(例如,用户设备)通信,存储单元703,用于存储装置700的程序代码和数据。
例如,处理单元701用于执行:
确定神经网络训练任务的训练参数数量;
根据训练参数数量从训练资源库中确定目标训练资源,其中,训练资源库包括至少一个训练资源,至少一个训练资源与至少一个参数数量之间存在对应关系,至少一个训练资源包括目标训练资源,至少一个参数数量包括神经网络训练任务的训练参数数量;
通过目标训练资源执行所述神经网络训练任务。
处理单元701可以是处理器或控制器,例如可以是CPU,通用处理器,数字信号处理器(digital signal processor,DSP),专用集成电路(application-specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。输入输出单元702例如是通信接口,存储单元703可以是存储器。
当处理单元701为处理器,输入输出单元702为通信接口,存储单元703为存储器时,本申请所涉及的训练神经网络的装置可以为图8所示的装置。
参阅图8所示,该装置800包括:处理器801、通信接口802(可选的)和存储器803(可选的)。其中,处理器801、通信接口802和存储器803可以通过内部连接通路相互通信,传递控制和/或数据信号。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
因此,本申请提供的训练神经网络的装置,根据神经网络训练任务可以从训练资源库中确定目标训练资源,无需向用户提供资源池的基础架构即可完成神经网络训练任务,从而减小了资源池的基础架构暴露导致的风险,提高了数据中心的安全性。
此外,用户也无需确定完成神经网络训练任务需要哪些训练资源,只需将需求发送至数据中心即可,从而提高了用户体验的满意度。
参见图9,本申请还提供了一种训练神经网络的系统架构200。
服务器210配置有输入/输出(input/output,I/O)接口212,与外部设备(例如,客户设备230)进行数据交互,“用户”可以通过客户设备230向I/O接口212输入神经网络训练任务。服务器210例如是数据中心。
服务器210可以调用数据存储系统240中的数据、代码等,也可以将数据、指令等存 入数据存储系统250中。
处理器211可以使用图6所示的方法600对训练神经网络,具体的处理可以参见图6的相关描述。
训练设备220用于根据处理器211的命令训练神经网络,训练设备220例如是图1所示的各个计算单元,其中,训练设备220用于处理神经网络训练任务,也可以被认为是服务器210的处理器。
最后,I/O接口212将处理结果(例如,训练完成的神经网络)返回给客户设备240,提供给用户。
在图9中所示情况下,用户可以手动指定输入服务器210中的数据,例如,在I/O接口212提供的界面中操作。另一种情况下,客户设备230可以自动地向I/O接口212输入数据并获得结果,如果客户设备230自动输入数据需要获得用户的授权,用户可以在客户设备230中设置相应权限。用户可以在客户设备230查看处理器210输出的结果,具体的呈现形式例如可以是将输出结果显示在屏幕上。客户设备230也可以作为数据采集端将采集到数据(例如,训练样本)存入数据存储系统240。
值得注意的,图9仅是本发明实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系对本申请的技术方案不构成任何限制,例如,在图9中,数据存储系统240相对服务器210是外部存储器,可选地,也可以将数据存储系统240置于服务器210中。类似地,训练设备200也可以置于服务器210中。
在本申请各个实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施过程构成任何限定。
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
结合本申请公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read only memory,ROM)、可擦除可编程只读存储器(erasable programmable ROM,EPROM)、电可擦可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或 数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (13)

  1. 一种训练神经网络的方法,其特征在于,包括:
    确定神经网络训练任务的训练参数数量;
    根据所述训练参数数量从训练资源库中确定目标训练资源,其中,所述训练资源库包括至少一个训练资源,所述至少一个训练资源与至少一个参数数量之间存在对应关系,所述至少一个训练资源包括所述目标训练资源,所述至少一个参数数量包括所述神经网络训练任务的训练参数数量;
    通过所述目标训练资源执行所述神经网络训练任务。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述训练参数数量从训练资源库中确定目标训练资源之前,所述方法还包括:
    建立所述训练资源库,其中,所述目标训练资源包括多个计算单元和所述多个计算单元之间的传输链路,所述对应关系包括以下三者之间的关联关系:所述目标训练资源、所述至少一个参数数量以及所述至少一个参数数量的参数更新速率。
  3. 根据权利要求2所述的方法,其特征在于,所述建立所述训练资源库,包括:
    通过所述目标训练资源更新多个神经网络参数,所述多个神经网络参数的数量为所述至少一个参数数量中的任意一个;
    根据所述多个神经网络参数的更新完成时间确定所述多个神经网络参数的参数更新速率,所述多个神经网络参数的参数更新速率与所述多个神经网络参数的更新完成时间成反比;
    保存所述多个神经网络参数的参数更新速率、所述多个神经网络参数的数量与所述目标训练资源的对应关系。
  4. 根据权利要求2或3所述的方法,其特征在于,所述神经网络训练任务还包括所述神经网络训练任务的训练模型和指定的样本迭代数量,所述样本迭代数量为更新一次参数所需输入的训练样本的数量,
    所述根据所述训练参数数量从训练资源库中确定目标训练资源,包括:
    根据所述对应关系从所述训练资源库中确定与所述训练参数数量对应的至少一个候选训练资源;
    在所述至少一个候选训练资源上测试所述训练模型,确定所述至少一个候选训练资源的参数生成速率;
    根据所述参数生成速率确定所述至少一个候选训练资源的优选样本迭代数量,所述优选样本迭代数量为所述候选训练资源的参数生成速率与参数更新速率匹配时所述候选训练资源的样本迭代数量;
    从所述至少一个候选训练资源中确定优选样本迭代数量与所述指定的样本迭代数量最接近的候选训练资源为所述目标训练资源。
  5. 根据权利要求2或3所述的方法,其特征在于,所述神经网络训练任务还包括所述神经网络训练任务的训练模型,
    所述根据所述训练参数数量从训练资源库中确定目标训练资源,包括:
    根据所述对应关系从所述训练资源库中确定与所述训练参数数量对应的至少一个候选训练资源;
    在所述至少一个候选训练资源上测试所述训练模型,确定所述至少一个候选训练资源的参数生成速率;
    根据所述参数生成速率确定所述至少一个候选训练资源的优选样本迭代数量,所述优选样本迭代数量为所述候选训练资源的参数生成速率与参数更新速率匹配时所述候选训练资源的样本迭代数量;
    从所述至少一个候选训练资源中确定优选样本迭代数量最大的候选训练资源为所述目标训练资源。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,在所述目标训练资源中,多个计算单元中任意一个计算单元承载的训练样本的数量与所述任意一个计算单元的参数更新速率成正比。
  7. 一种训练神经网络的装置,其特征在于,包括处理单元,用于:
    确定神经网络训练任务的训练参数数量;
    根据所述训练参数数量从训练资源库中确定目标训练资源,其中,所述训练资源库包括至少一个训练资源,所述至少一个训练资源与至少一个参数数量之间存在对应关系,所述至少一个训练资源包括所述目标训练资源,所述至少一个参数数量包括所述神经网络训练任务的训练参数数量;
    通过所述目标训练资源执行所述神经网络训练任务。
  8. 根据权利要求7所述的装置,其特征在于,所述处理单元还用于:
    建立所述训练资源库,其中,所述目标训练资源包括多个计算单元和所述多个计算单元之间的传输链路,所述对应关系包括以下三者之间的关联关系:所述目标训练资源、所述至少一个参数数量以及所述至少一个参数数量的参数更新速率。
  9. 根据权利要求8所述的装置,其特征在于,所述处理单元具体用于:
    通过所述目标训练资源更新多个神经网络参数,所述多个神经网络参数的数量为所述至少一个参数数量中的任意一个;
    根据所述多个神经网络参数的更新完成时间确定所述多个神经网络参数的参数更新速率,所述多个神经网络参数的参数更新速率与所述多个神经网络参数的更新完成时间成反比;
    保存所述多个神经网络参数的参数更新速率、所述多个神经网络参数的数量与所述目标训练资源的对应关系。
  10. 根据权利要求8或9所述的装置,其特征在于,所述神经网络训练任务还包括所述神经网络训练任务的训练模型和指定的样本迭代数量,所述样本迭代数量为更新一次参数所需输入的训练样本的数量,
    所述处理单元具体用于:
    根据所述对应关系从所述训练资源库中确定与所述训练参数数量对应的至少一个候选训练资源;
    在所述至少一个候选训练资源上测试所述训练模型,确定所述至少一个候选训练资源的参数生成速率;
    根据所述参数生成速率确定所述至少一个候选训练资源的优选样本迭代数量,所述优选样本迭代数量为所述候选训练资源的参数生成速率与参数更新速率匹配时所述候选训练资源的样本迭代数量;
    从所述至少一个候选训练资源中确定优选样本迭代数量与所述指定的样本迭代数量最接近的候选训练资源为所述目标训练资源。
  11. 根据权利要求8或9所述的装置,其特征在于,所述神经网络训练任务还包括所述神经网络训练任务的训练模型,
    所述处理单元具体用于:
    根据所述对应关系从所述训练资源库中确定与所述训练参数数量对应的至少一个候选训练资源;
    在所述至少一个候选训练资源上测试所述训练模型,确定所述至少一个候选训练资源的参数生成速率;
    根据所述参数生成速率确定所述至少一个候选训练资源的优选样本迭代数量,所述优选样本迭代数量为所述候选训练资源的参数生成速率与参数更新速率匹配时所述候选训练资源的样本迭代数量;
    从所述至少一个候选训练资源中确定优选样本迭代数量最大的候选训练资源为所述目标训练资源。
  12. 根据权利要求7至11中任一项所述的装置,其特征在于,在所述目标训练资源中,多个计算单元中任意一个计算单元承载的训练样本的数量与所述任意一个计算单元的参数更新速率成正比。
  13. 一种训练神经网络的系统,其特征在于,包括处理器,多个计算单元以及存储器,所述处理器用于基于所述存储器中存储的指令执行如权利要求1至6中任一项所述的方法,确定所述目标训练资源;
    所述多个计算单元用于:通过所述目标训练资源执行所述神经网络训练任务。
PCT/CN2018/109212 2018-09-30 2018-09-30 训练神经网络的方法和装置 WO2020062303A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880095511.2A CN112400160A (zh) 2018-09-30 2018-09-30 训练神经网络的方法和装置
PCT/CN2018/109212 WO2020062303A1 (zh) 2018-09-30 2018-09-30 训练神经网络的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/109212 WO2020062303A1 (zh) 2018-09-30 2018-09-30 训练神经网络的方法和装置

Publications (1)

Publication Number Publication Date
WO2020062303A1 true WO2020062303A1 (zh) 2020-04-02

Family

ID=69950962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109212 WO2020062303A1 (zh) 2018-09-30 2018-09-30 训练神经网络的方法和装置

Country Status (2)

Country Link
CN (1) CN112400160A (zh)
WO (1) WO2020062303A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688493A (zh) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 训练深度神经网络的方法、装置及系统
CN108280514A (zh) * 2018-01-05 2018-07-13 中国科学技术大学 基于fpga的稀疏神经网络加速系统和设计方法
CN108364063A (zh) * 2018-01-24 2018-08-03 福州瑞芯微电子股份有限公司 一种基于权值分配资源的神经网络训练方法和装置
CN108460453A (zh) * 2017-02-21 2018-08-28 阿里巴巴集团控股有限公司 一种用于ctc训练的数据处理方法、装置及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201723A (zh) * 2016-07-13 2016-12-07 浪潮(北京)电子信息产业有限公司 一种数据中心的资源调度方法及装置
CN107808660A (zh) * 2016-09-05 2018-03-16 株式会社东芝 训练神经网络语言模型的方法和装置及语音识别方法和装置
EP3336800B1 (de) * 2016-12-19 2019-08-28 Siemens Healthcare GmbH Bestimmen einer trainingsfunktion zum generieren von annotierten trainingsbildern

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688493A (zh) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 训练深度神经网络的方法、装置及系统
CN108460453A (zh) * 2017-02-21 2018-08-28 阿里巴巴集团控股有限公司 一种用于ctc训练的数据处理方法、装置及系统
CN108280514A (zh) * 2018-01-05 2018-07-13 中国科学技术大学 基于fpga的稀疏神经网络加速系统和设计方法
CN108364063A (zh) * 2018-01-24 2018-08-03 福州瑞芯微电子股份有限公司 一种基于权值分配资源的神经网络训练方法和装置

Also Published As

Publication number Publication date
CN112400160A (zh) 2021-02-23

Similar Documents

Publication Publication Date Title
CN107766126B (zh) 容器镜像的构建方法、系统、装置及存储介质
WO2018099084A1 (zh) 一种神经网络模型训练方法、装置、芯片和系统
US10970069B2 (en) Meta-indexing, search, compliance, and test framework for software development
US9794343B2 (en) Reconfigurable cloud computing
US9262231B2 (en) System and method for modifying a hardware configuration of a cloud computing system
US9658895B2 (en) System and method for configuring boot-time parameters of nodes of a cloud computing system
JP2021505993A (ja) 深層学習アプリケーションのための堅牢な勾配重み圧縮方式
WO2017124713A1 (zh) 一种数据模型的确定方法及装置
TW201820165A (zh) 用於雲端巨量資料運算架構之伺服器及其雲端運算資源最佳化方法
US11250073B2 (en) Method and apparatus for crowdsourced data gathering, extraction, and compensation
WO2018103562A1 (zh) 一种数据处理系统及方法
JP7287397B2 (ja) 情報処理方法、情報処理装置及び情報処理プログラム
US8539404B2 (en) Functional simulation redundancy reduction by state comparison and pruning
US20180062938A1 (en) Virtual agents for facilitation of network based storage reporting
WO2020238712A1 (zh) 云产品的推荐方法、装置、电子设备及计算机可读介质
CN110727664A (zh) 用于对公有云数据执行目标操作的方法与设备
Kumar et al. Fog and edge computing simulators systems: research challenges and an overview
US20240095529A1 (en) Neural Network Optimization Method and Apparatus
CN110825589A (zh) 用于微服务系统的异常检测方法及其装置和电子设备
WO2020107264A1 (zh) 神经网络架构搜索的方法与装置
US20230334325A1 (en) Model Training Method and Apparatus, Storage Medium, and Device
WO2020062303A1 (zh) 训练神经网络的方法和装置
US20220107817A1 (en) Dynamic System Parameter for Robotics Automation
US11748138B2 (en) Systems and methods for computing a success probability of a session launch using stochastic automata
US11811862B1 (en) System and method for management of workload distribution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18935732

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18935732

Country of ref document: EP

Kind code of ref document: A1