WO2017167114A1 - 一种类Alexnet网络的模型训练方法和装置 - Google Patents

一种类Alexnet网络的模型训练方法和装置 Download PDF

Info

Publication number
WO2017167114A1
WO2017167114A1 PCT/CN2017/077897 CN2017077897W WO2017167114A1 WO 2017167114 A1 WO2017167114 A1 WO 2017167114A1 CN 2017077897 W CN2017077897 W CN 2017077897W WO 2017167114 A1 WO2017167114 A1 WO 2017167114A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
gradient value
layer
alexnet
queue
Prior art date
Application number
PCT/CN2017/077897
Other languages
English (en)
French (fr)
Inventor
王思宇
Original Assignee
阿里巴巴集团控股有限公司
王思宇
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 王思宇 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017167114A1 publication Critical patent/WO2017167114A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of information technology, and in particular to a model training method for an Alexnet-like network and a model training device for an Alexnet-like network.
  • Artificial Intelligence is a new technical science that studies and develops theories, methods, techniques, and application systems for simulating, extending, and extending human intelligence. It attempts to understand the essence of intelligence and produce a new Intelligent machines that respond in a manner similar to human intelligence, including robotics, speech recognition, image recognition, natural language processing, and expert systems. Since the birth of artificial intelligence, the theory and technology have become more and more mature, and the application fields have been expanding. In recent years, Deep Learning has directly attempted to solve the problem of abstract cognition and made breakthrough progress. This revolution, which was detonated by deep learning, brought artificial intelligence to a new level. It is not only of great academic significance, but also very practical.
  • CNNs Convolutional Neural Networks
  • Alexnet network is often used by developers.
  • Alexnet network As shown in Figure 1, it is an example of the structure of an Alexnet network.
  • the two most important layer types are convolutional Convolution (ie, Convolution1 to pool5 in Figure 1) and fully connected layer Inner Product (ie, Inner Product6 to loss layer in Figure 1).
  • convolutional Convolution ie, Convolution1 to pool5 in Figure 1
  • fully connected layer Inner Product ie, Inner Product6 to loss layer in Figure 1.
  • the process of performing model training in the Alexnet network can be described as follows:
  • the convolutional layer part has a very large amount of computation, which accounts for more than 80% of the calculation time of the entire network, but the convolutional layer needs to update the parameter quantity.
  • Very small only 10% of the entire network parameters; and full
  • the connection layer is completely opposite to the convolutional layer.
  • the full connection layer has 90% of the parameters to be updated in the entire network, but the calculation time only accounts for 20% of the entire network.
  • a single-machine multi-card that is, a computer with multiple graphics processing unit GPUs
  • a full model must be maintained on each GPU. And training on both models simultaneously.
  • two cards two graphics processing unit GPUs
  • FIG. 2 is a working principle diagram of a master card and a slave card in the prior art.
  • the gradient value calculated from the model on the card needs to be sent to the main card model, and the main card updates the parameters after calculating the average value of the gradient values, and finally the latest model on the main card.
  • the broadcast is sent to the slave card to continue the next training.
  • the gradient value of the fully connected layer is first calculated, and after the gradient values of the fully connected layer are summarized on the main card, the gradient value of the convolution layer is calculated, and the time taken for the whole process will be Very much, seriously affecting the operational efficiency of model training.
  • embodiments of the present application have been made in order to provide a model training method for an Alexnet-like network that overcomes the above problems or at least partially solves the above problems, and a corresponding model training device for an Alexnet-like network.
  • a model training method for a Alexnet network including:
  • the model of the Alexnet-like network is trained using the first model parameter and the second model parameter.
  • the Alexnet-like network is composed of a fully connected layer and a convolution layer
  • the step of calculating, by the first graphics processing unit GPU, the first gradient value and the second gradient value in the Alexnet-like network includes:
  • a first gradient processing value under the fully connected layer and a second gradient value under the convolutional layer are calculated using a first graphics processing unit GPU.
  • the first graphics processing unit GPU includes a first computing queue, and the first graphics processing unit GPU calculates a first gradient value under the fully connected layer and a first layer under the convolution layer
  • the steps of the two gradient values include:
  • a first gradient value under the fully connected layer and a second gradient value under the convolutional layer are calculated using a first computational queue.
  • the first graphics processing unit GPU further includes a first communication queue
  • the second graphics processing unit GPU includes a second communication queue
  • the receiving second graphics processing unit GPU sends the Alexnet network in the class
  • the steps of the third gradient value include:
  • the step of receiving the fourth gradient value in the Alexnet-like network sent by the GPU of the second graphics processing unit includes:
  • the fourth communication queue is used to receive the fourth gradient value sent by the second communication queue.
  • the second graphics processing unit further includes a second calculation queue, where the third gradient value and the fourth gradient value are respectively obtained by the following steps:
  • a fourth gradient value under the convolutional layer is calculated using a second computational queue.
  • the step of calculating the first model parameter of the Alexnet-like network according to the first gradient value and the third gradient value comprises:
  • the step of calculating the second model parameter of the Alexnet-like network by using the second gradient value and the fourth gradient value comprises:
  • the method before the step of calculating, by using the first graphics processing unit GPU, the first gradient value and the second gradient value in the Alexnet-like network, the method further includes:
  • the network includes m structural layers, and the step of determining whether the network is an Alexnet-like network includes:
  • the calculation time of the m structural layers is accumulated layer by layer, and the sum of the calculation times up to the nth layer is obtained respectively;
  • the network is divided into Alexnet-like networks.
  • the step of dividing the network into an Alexnet-like network includes:
  • the remaining m-p layers are divided into convolutional layers of an Alexnet-like network.
  • the present application also discloses a model training device for a variety of Alexnet networks, including:
  • a first calculating module configured to calculate, by using a first graphics processing unit GPU, a first gradient value and a second gradient value in an Alexnet-like network;
  • a first receiving module configured to receive a third gradient value sent by the second graphics processing unit GPU under the Alexnet-like network
  • a second calculating module configured to calculate a first model parameter of the Alexnet-like network according to the first gradient value and the third gradient value;
  • a second receiving module configured to receive a fourth gradient value sent by the second graphics processing unit GPU under the Alexnet-like network
  • a third calculating module configured to calculate a second model parameter of the Alexnet-like network according to the second gradient value and the fourth gradient value;
  • a training module configured to train the model of the Alexnet-like network by using the first model parameter and the second model parameter.
  • the Alexnet-like network is composed of a fully connected layer and a convolution layer
  • the first computing module includes:
  • a first calculation submodule configured to calculate, by the first graphics processing unit GPU, a first gradient value under the fully connected layer and a second gradient value under the convolution layer.
  • the first graphics processing unit GPU includes a first computing queue
  • the first computing submodule includes:
  • a first calculating unit configured to calculate, by using the first calculation queue, a first gradient value under the fully connected layer and a second gradient value under the convolution layer.
  • the first graphics processing unit GPU further includes a first communication queue
  • the second graphics processing unit GPU includes a second communication queue
  • the first receiving module includes:
  • a first receiving submodule configured to receive, by using the first communications queue, a third gradient value sent by the second communications queue
  • the second receiving module includes:
  • a second receiving submodule configured to receive, by using the first communications queue, a fourth gradient value sent by the second communications queue.
  • the second graphics processing unit further includes a second calculation queue, where the third gradient value and the fourth gradient value are respectively obtained by using the following modules:
  • a fourth calculating module configured to calculate, by using a second computing queue, a third gradient value under the fully connected layer
  • a fifth calculating module configured to calculate a fourth gradient value under the convolution layer by using a second calculation queue.
  • the second calculating module includes:
  • a first model parameter calculation submodule configured to calculate an average of the first gradient value and the third gradient value to obtain a first model parameter of the Alexnet-like network.
  • the third calculating module includes:
  • a second model parameter calculation submodule configured to calculate an average of the second gradient value and the fourth gradient value to obtain a second model parameter.
  • the device further includes:
  • the judging module is configured to determine whether the network is an Alexnet-like network.
  • the network includes m structural layers
  • the determining module includes:
  • Calculating a total time and a total parameter quantity obtaining submodule configured to obtain a total calculation time and a total parameter quantity of the network according to the calculation time and the parameter quantity;
  • a sum of parameter quantities obtaining a sub-module configured to accumulate a parameter quantity of the remaining mp layer when the ratio of the sum of the calculation time of the p-th layer and the calculated total time satisfies the first preset condition, to obtain the remaining The sum of the parameter quantities of the mp layer;
  • a determining sub-module configured to determine whether a ratio of a sum of parameter quantities of the remaining m-p layers and the total parameter quantity satisfies a second preset condition
  • the sub-module is configured to divide the network into an Alexnet-like network when the second preset condition is met.
  • the dividing submodule includes:
  • a fully connected layer dividing unit configured to divide a front p layer of the network into a fully connected layer of an Alexnet-like network
  • a convolutional layer dividing unit is configured to divide the remaining m-p layer into a convolutional layer of an Alexnet-like network.
  • the embodiments of the present application include the following advantages:
  • the embodiment of the present application constructs a corresponding calculation queue and a communication queue on the first graphics unit GPU (primary card) and the second graphics unit GPU (slave card) respectively, performs a calculation process by using a calculation queue, and uses a communication queue for data communication.
  • the two processes of calculation and communication are separated, and the calculation of the convolutional layer of the Alexnet network is further paralleled with the communication of the full connection parameters, which effectively reduces the time spent in the model training process and improves the operational efficiency of the model training.
  • the network may be pre-trained, and the obtained time parameter is analyzed to determine whether the network belongs to an Alexnet-like network.
  • FIG. 1 is a diagram showing an example of the structure of an Alexnet network
  • FIG. 2 is a schematic diagram of the working principle of the primary card and the secondary card in the prior art
  • Embodiment 3 is a flow chart showing the steps of Embodiment 1 of a model training method for an Alexnet network according to the present application;
  • Embodiment 4 is a working principle diagram of Embodiment 1 of a model training method for an Alexnet network according to the present application;
  • FIG. 5 is a flow chart of steps of a second embodiment of a model training method for an Alexnet network according to the present application
  • Embodiment 6 is a data back propagation calculation time and a parameter quantity statistical diagram of Embodiment 2 of a model training method for an Alexnet network according to the present application;
  • FIG. 7 is a flowchart of an algorithm for determining whether a network is an Alexnet-like network according to the present application.
  • FIG. 8 is a structural block diagram of an embodiment of a model training device for an Alexnet network according to the present application.
  • the method may include the following steps:
  • Step 301 Calculate a first gradient value and a second gradient value in an Alexnet-like network by using a first graphics processing unit GPU;
  • the convolutional layer part In the Alexnet network, whether it is a forward propagation process or a back propagation process, the convolutional layer part has a very large amount of computation, which accounts for more than 80% of the calculation time of the entire network, but the convolutional layer needs to update the parameter quantity. Very small, only 10% of the entire network parameters; and the full connection layer part is completely opposite to the convolutional layer, the full connection layer part has 90% of the parameters to be updated of the entire network, but the calculation time only accounts for the entire network 20%.
  • a network having the above characteristics and passing through a convolution layer portion in the forward propagation of data and then passing through the fully connected layer portion may be referred to as an Alexnet-like network.
  • the Alexnet-like network may consist of a fully connected layer and a convolutional layer.
  • GPU Graphics Processing Unit
  • display core an image that is specially used on personal computers, workstations, game consoles and some mobile devices (such as tablets, smart phones, etc.).
  • a microprocessor for computing operations often used for high-performance computing, with high concurrent processing of data.
  • the first graphics processing unit GPU can be regarded as a master card in a single-machine multi-card environment
  • the second graphics processing unit GPU can be regarded as a slave card in a single-machine multi-card environment.
  • the primary card and the secondary card must hold the same network structure, so after the Start, the primary card needs to broadcast the network structure of the card to the secondary card, and the secondary network receives the network structure through the Receive Model process. , so that the two cards are consistent. Then the two cards begin to perform the same behavior, with the goal of performing forward propagation and calculating the Loss value.
  • Forward propagation is the process of calculating from the first layer to the last layer.
  • the process of forward propagation is performed in the direction indicated by the arrow between each layer and each layer in the Alexnet network in FIG. For example, from Data to Convolusion1, then to relu1... all the way to the last loss layer. At this time, the loss layer will get a Loss value, which is called the loss value, and the latter process backpropagation can be carried out with the requirement that the Loss value be obtained.
  • the convolution layer is passed through and then the fully connected layer.
  • the backward propagation is performed, first through the fully connected layer, then through the convolution layer, and the gradient values of the respective layers are calculated accordingly.
  • the Loss value function Loss can be used to evaluate whether the classification is accurate or whether the regression is accurate.
  • the Loss value is related to the parameters of the neural network. If the parameter meets the requirements of the application scenario, the Loss value will be lower. If all the model parameters of the network are composed of w vectors, it can be obtained that the Loss value is related to the w vector. Generally, a good w vector can lower the Loss value. So the question can be attributed to how to find a good w vector? This requires training to let the model find it by itself.
  • the model must find the correct direction that will cause the Loss value to fall, and the mathematical value of the gradient represents the fastest direction in which the Loss value falls. As long as the w vector is updated one step at a time in the direction of the gradient, the Loss value is reduced. This is the role of the gradient.
  • the calculation of the gradient is obtained based on the partial differential of the Loss value with respect to each w vector, and the process of obtaining the partial differential is performed in the process of backpropagation of the data.
  • the first gradient value is a fully connected layer gradient
  • the second gradient value is a convolution layer gradient
  • the step of calculating the first gradient value and the second gradient value in the Alex-like network using the first graphics processing unit GPU may specifically include the following sub-steps:
  • Sub-step 3011 using a first graphics processing unit GPU to calculate a first gradient value under the fully connected layer and a second gradient value under the convolution layer.
  • CUDA is a general-purpose parallel computing architecture introduced by NVIDIA that enables GPUs to solve complex computational problems and enable GPU programming on a computer.
  • the first graphics processing unit GPU may include a first calculation queue and a first communication queue
  • the second graphics processing unit GPU may include a second calculation on the slave card. a queue and a second communication queue, wherein the first calculation queue, the first communication queue, the second calculation queue, and the second communication queue are both CUDA streams, wherein the first calculation queue and the second calculation queue are available for calculation, and
  • the first communication queue and the second communication queue can be used for communication to separate the calculation and communication of the primary and secondary cards for parallel processing.
  • the sub-step of calculating, by the first graphics processing unit GPU, the first gradient value under the fully connected layer and the second gradient value under the convolution layer may be Further includes:
  • a first gradient value under the fully connected layer and a second gradient value under the convolutional layer are calculated using a first computational queue.
  • the process when calculating the first gradient value under the fully connected layer, the process is only the first half of the back propagation.
  • the complete process of backpropagation starts from the loss layer (the last layer) and propagates layer by layer in the opposite direction of the arrow. For example, from the loss layer to the inner produce8, then to the drop7... until the convolution1.
  • the process of calculating the first gradient value only includes the process of propagating from the loss layer to the inner product6 in the figure.
  • each backpropagation layer calculates the gradient value of the layer for the layer with parameters (some layers do not calculate the gradient because the layer has no parameters, such as The loss layer, the drop layer, and the relu layer, only the inner product layer calculates the gradient) and stores it in the layer.
  • the gradient parameters calculated by this process will be very much, but the whole calculation process is very fast, which is the characteristic of this process.
  • the process is only the second half of the backpropagation, that is, the process of propagating from pool 5 to convolution1.
  • each backpropagation layer will calculate the gradient of the layer for the parameterized layer (some layers will not calculate the gradient because the layer has no parameters, such as relu Layer, norm layer, pool layer, only the convolution layer will calculate the gradient) and stored in this layer.
  • the gradient parameters calculated by this process will be very, very small, but the entire calculation process will be very slow, which is characteristic of this process.
  • Step 302 Receive a third gradient value sent by the second graphics processing unit GPU under the Alexnet-like network.
  • the third gradient value may be specifically obtained by the following steps:
  • a third gradient value under the fully connected layer is calculated using a second computational queue.
  • the second graphics processing unit GPU calculates a third gradient value under the all-connection layer from a second calculation queue on the card, and calculates a first gradient from a first calculation queue on the primary card. Values are performed in parallel at the same time.
  • the step of receiving the third gradient value in the Alexnet-like network sent by the GPU of the second graphics processing unit may specifically include the following sub-steps:
  • Sub-step 3021 Receive a third gradient value sent by the second communication queue by using the first communication queue.
  • the calculation queue may be used to perform a corresponding calculation process, and the communication queue is used to perform corresponding data transmission and reception. Therefore, the first communication may be adopted.
  • the queue receives the third gradient value sent by the second communication queue.
  • the convolution layer calculation and the fully connected layer can be performed by means of stream parallelism.
  • the parameter communication is performed in parallel, that is, when the primary card calculates the second gradient value by using the first calculation queue, the first communication queue is used to receive the third gradient value sent by the second communication queue, so that the calculation and communication process is obtained in time. The overlap, the two do not interfere with each other.
  • Step 303 Calculate a first model parameter of the Alexnet-like network according to the first gradient value and the third gradient value.
  • the first gradient value and the third gradient value are gradients of the fully connected layer of the Alexnet network of the primary card and the slave card respectively, and therefore, when the data from the card is aggregated onto the primary card, it is required to be based on the primary card. And the data from the card is updated for the data of the fully connected layer.
  • the step of calculating the first model parameter of the Alexnet-like network according to the first gradient value and the third gradient value may specifically include the following sub-steps:
  • Sub-step 3031 calculating an average of the first gradient value and the third gradient value to obtain a first model parameter of the Alexnet-like network.
  • the first model parameter is the updated fully connected layer gradient.
  • Step 304 Receive a fourth gradient value sent by the second graphics processing unit GPU under the Alexnet-like network.
  • the fourth gradient value may be specifically obtained by the following steps:
  • a fourth gradient value under the convolutional layer is calculated using a second computational queue.
  • the second graphics processing unit GPU calculates a fourth gradient value under the convolution layer from a second calculation queue on the card, and calculates a second gradient from the first calculation queue on the primary card. Values are performed in parallel at the same time.
  • the step of receiving the fourth gradient value in the Alexnet-like network sent by the GPU of the second graphics processing unit may specifically include the following sub-steps:
  • Sub-step 3041 Receive a fourth gradient value sent by the second communication queue by using the first communication queue.
  • the calculation queue may be used to perform a corresponding calculation process, and the communication queue is used to perform corresponding data transmission and reception. Therefore, the first communication may be adopted.
  • the queue receives the fourth gradient value sent by the second communication queue.
  • Step 305 Calculate a second model parameter of the Alexnet-like network according to the second gradient value and the fourth gradient value.
  • the second gradient value and the fourth gradient value are gradients of the convolution layer of the Alexnet network of the primary card and the slave card respectively, and therefore, when the data from the card is aggregated onto the primary card, it is required to be based on the primary card. And the data from the card is updated to the data of the convolutional layer.
  • the step of calculating the second model parameter of the Alexnet-like network according to the second gradient value and the fourth gradient value may specifically include the following sub-steps:
  • Sub-step 3051 calculating an average of the second gradient value and the fourth gradient value to obtain a second model parameter of the Alexnet-like network.
  • the second model parameter is the updated convolution layer gradient.
  • Step 306 Train the model of the Alexnet-like network with the first model parameter and the second model parameter.
  • the primary card may update its own model parameters for the first model parameter and the second model parameter to obtain a new one. Training model.
  • T1 a+b+c+m+n, where c>>b,m>>n
  • the flow time parallel method is used to parallelize the calculation of the convolution layer and the full connection parameter communication in the back propagation process, and the total time T2 is:
  • the calculation queue and the communication queue are respectively constructed on the first graphics unit GPU (primary card) and the second graphics unit GPU (slave card), and the calculation process is performed by using the calculation queue, and the communication queue is adopted.
  • Data communication is carried out to separate the two processes of calculation and communication, and further parallelize the calculation of the convolutional layer of the Alexnet network and the communication of the full connection parameters, which effectively reduces the time spent in the model training process and improves the operation of the model training. effectiveness.
  • FIG. 5 a flow chart of the steps of the second embodiment of the model training method of the Alexnet network of the present application is shown, which may specifically include the following steps:
  • Step 501 Determine whether the network is an Alexnet-like network.
  • the network is generalized and divided into two parts, called M and N, respectively, and in the opposite direction
  • M and N respectively, and in the opposite direction
  • the M part is first calculated and then the N part is calculated. If the calculation of the M part occupies a small part of the whole time, all the parameters occupy a large part of all the parameters, and the N part has the opposite of M.
  • the characteristics then you can call this network as a class like Alexnet network.
  • the network for a network including m structural layers, it may be first determined whether the network is an Alexnet-like network.
  • the step of determining whether the network is an Alexnet-like network may specifically include the following sub-steps:
  • Sub-step 5011 pre-training the network to obtain a calculation time and a parameter quantity of each structural layer
  • Sub-step 5012 obtaining a total calculation time and a total parameter quantity of the network according to the calculation time and the parameter quantity;
  • Sub-step 5013 accumulating the calculation time of the m structural layers layer by layer according to a preset transmission sequence, respectively obtaining the sum of the calculation times up to the p-th layer;
  • Sub-step 5014 when the ratio of the sum of the calculation time to the p-th layer and the calculated total time satisfies the first preset condition, accumulating the parameter amount of the remaining mp layer to obtain the parameter quantity of the remaining mp layer with;
  • Sub-step 5015 determining whether a ratio of a sum of parameter quantities of the remaining m-p layers to the total parameter quantity satisfies a second preset condition
  • the computer may be used to determine whether the current network belongs to an Alexnet-like network, that is, the time parameter obtained by the pre-training may be analyzed.
  • the network is pre-trained, and the calculation time and the parameter quantity of each structure layer can be obtained. Then, according to the calculation time and the parameter quantity, the total calculation time and the total parameter quantity of the network can be obtained; Sequence (generally, the preset transmission order may be a reverse transmission direction, that is, a process of transmitting from the last layer of the network to the first layer), and the calculation time of the m structural layers is accumulated layer by layer, respectively Obtaining a sum of calculation times up to the pth layer; when the ratio of the sum of the calculation time up to the pth layer and the calculated total time satisfies the first preset condition, accumulating the parameter amount of the remaining mp layer, obtaining the The sum of the parameter quantities of the remaining mp layers; finally determining whether the ratio of the sum of the parameter quantities of the remaining mn layers to the total parameter quantity satisfies the second preset condition.
  • the preset transmission order may be a reverse transmission direction, that is, a process of transmitting from the last layer
  • the Alexnet network is characterized in that the amount of the parameter with a large amount of calculation is small, and the amount of the parameter with a small amount of calculation is large, the person skilled in the art can set the first preset condition and the second preset condition accordingly.
  • the specific numerical values are not specifically limited in the present application.
  • the pre-trained network can be divided into two parts, namely part M and part N. Then the problem can be classified as how to choose the dividing point of the split M and N. Further, the process of selecting the demarcation point may be performed as follows: the network is pre-trained several times before training, and the calculation time and parameters possessed by each layer when performing backpropagation at each operation are calculated. the amount. Then, the sum of the back propagation times of all layers is denoted as T, and the parameter quantity possessed by all layers is recorded as V, starting from the starting layer of the back propagation, and the calculation time of the next layer is continuously accumulated and recorded as t. The accumulation is stopped when t/T>0.1, and the current layer is recorded as the pth layer.
  • v/V ⁇ 0.1 at this time the network type can be considered as an Alexnet-like network, and sub-step 5016 can be continued. If v/V>0.1, then this network can be considered not to be Alexnet-like.
  • Sub-step 5016 the network is divided into an Alexnet-like network.
  • the sub-step of dividing the network into an Alexnet-like network may further include:
  • the remaining m-p layers are divided into convolutional layers of an Alexnet-like network.
  • the network When the network is confirmed to belong to the class-like Alexnet network, the network may be specifically divided into a fully connected layer portion and a convolved layer portion according to the obtained decomposition point, that is, the p-th layer obtained in the sub-steps 5011-5015.
  • the abscissa is the number of layers
  • the left part is the calculation time statistics of each layer of the back propagation
  • the ordinate unit is ms.
  • the right part is the amount of parameters owned by each layer, in bytes.
  • the sum of the calculation times of all the layers from the starting layer of the backpropagation to the p-th layer calculated above is 10% of the total time of the total backpropagation, and at this time, if the layer p to the back-propagation
  • the sum of the parameter quantities possessed by all layers between the last layers occupies about 10% of the total parameter amount, which indicates that the first layer to the p-th layer of the back propagation has a parameter amount of 90%.
  • the portion from the last layer of the back propagation to the p-th layer is M, that is, the fully-connected layer
  • the portion from the p-th layer to the back-propagating first layer is N, that is, the convolution layer portion.
  • Step 502 Calculate, by using a first calculation queue, a first gradient value under the fully connected layer and a second gradient value under the convolution layer;
  • the calculation queue and the communication queue may be respectively constructed on the first graphics processing unit GPU, that is, the main card, and the second graphics processing unit GPU, that is, the slave card.
  • the first calculation queue and the first communication queue may be built on the primary card, and the second calculation queue and the second communication queue are constructed on the slave card, and the corresponding calculation process is performed by the first calculation queue and the second calculation queue. Performing a corresponding communication process by the first communication queue and the second communication queue to obtain time Overlapping.
  • the first gradient value under the fully connected layer and the second gradient value under the convolution layer may be calculated by using the first calculation queue.
  • the process when calculating the first gradient value under the fully connected layer, the process is only the first half of the back propagation.
  • the complete process of backpropagation starts from the loss layer (the last layer) and propagates layer by layer in the opposite direction of the arrow. For example, from the loss layer to the inner produce8, then to the drop7... until the convolution1.
  • the process of calculating the first gradient value only includes the process of propagating from the loss layer to the inner product6 in the figure.
  • each backpropagation layer calculates the gradient value of the layer for the layer with parameters (some layers do not calculate the gradient because the layer has no parameters, such as The loss layer, the drop layer, and the relu layer, only the inner product layer calculates the gradient) and stores it in the layer.
  • the gradient parameters calculated by this process will be very much, but the whole calculation process is very fast, which is the characteristic of this process.
  • the process is only the second half of the backpropagation, that is, the process of propagating from pool 5 to convolution1.
  • each backpropagation layer will calculate the gradient of the layer for the parameterized layer (some layers will not calculate the gradient because the layer has no parameters, such as relu Layer, norm layer, pool layer, only the convolution layer will calculate the gradient) and stored in this layer.
  • the gradient parameters calculated by this process will be very, very small, but the entire calculation process will be very slow, which is characteristic of this process.
  • Step 503 Receive a third gradient value sent by the second communication queue by using the first communication queue.
  • the third gradient value may be obtained by the second graphics processing unit GPU, that is, calculated from the second calculation queue on the card, and the process of calculating the third gradient value is performed with the first on the primary card.
  • the calculation queue calculates the first gradient value while performing in parallel.
  • the calculation queue since the calculation process and the communication process of the master card and the slave card are performed separately, the calculation queue performs a corresponding calculation process, and the communication queue performs corresponding data transmission and reception. Therefore, the first communication queue may be used to receive the first The third gradient value sent by the second communication queue.
  • the calculation of the convolution layer and the parameter communication of the fully connected layer may be performed in parallel by using a flow parallel manner, that is, when the primary card calculates the second gradient value by using the first calculation queue, the first communication is adopted.
  • the queue receives the third gradient value sent by the second communication queue, so that the process of calculation and communication obtains a temporal overlap, and the two do not interfere with each other.
  • Step 504 Calculate an average of the first gradient value and the third gradient value to obtain a first model parameter of the Alexnet-like network.
  • the first gradient value and the third gradient value are gradients of the fully connected layer of the Alexnet network of the primary card and the slave card respectively, and therefore, when the data from the card is summarized After going to the main card, it is necessary to update the data of the full connection layer according to the data of the main card and the slave card.
  • the first gradient value and the third gradient value may be added and averaged to obtain a first model parameter, where the first model parameter is an updated fully connected layer gradient.
  • Step 505 Receive a fourth gradient value sent by the second communication queue by using the first communication queue.
  • the fourth gradient value may be obtained by using a second calculation queue, and then the fourth gradient value may be sent to the primary card by using the second communication queue.
  • the second graphics processing unit GPU calculates a fourth gradient value under the convolution layer from a second calculation queue on the card, and calculates a second gradient from the first calculation queue on the primary card. Values are performed in parallel at the same time.
  • Step 506 Calculate an average of the second gradient value and the fourth gradient value to obtain a second model parameter of the Alexnet-like network.
  • the second gradient value and the fourth gradient value may be added and averaged to obtain a second model parameter, where the second model parameter is an updated convolution layer gradient.
  • Step 507 Train the model of the Alexnet-like network with the first model parameter and the second model parameter.
  • the network before the model training is performed by using a certain network, the network may be pre-trained, and the obtained time parameter is analyzed to determine whether the network belongs to an Alexnet-like network.
  • the BroadCast Model of the primary card and the Receive Model of the secondary card The BroadCast Model of the primary card transmits the model of the primary card to the secondary card, and the Receive Model of the secondary card is responsible for receiving the primary card. This process is to keep both cards with the same Alexnet network structure. The reason why the gradient is sent to the main card and then updated is because only the model of the main card is updated during the model training, regardless of the slave card. Because the model of the primary card needs to be broadcast to the secondary card before the second round of Forward begins. Therefore, it can always be based on the main card model.
  • Forward process of the primary card and Forward process of the secondary card The two cards of the process behave the same, belonging to the forward propagation, according to the direction indicated by the arrow between each layer and each layer in the Alexnet network in FIG. For example, from Data to Convolusion1, then to relu1... until the final loss layer. At this time, the loss layer will get a loss value, which is called the loss value, and the latter process Backward can be used to obtain the loss value, so you must first perform Forward and then Backward.
  • the convolutional layer is first passed through, and then the fully connected layer is passed. Each layer is calculated There are some differences, which are due to the different calculation formulas of the layers.
  • the main card and the slave card Backward For Inner Product process the process of the two cards behaves the same, the process is only the first half of the Backward complete.
  • the Backward complete process starts from the loss layer (ie, the last layer) and propagates layer by layer in the opposite direction of the arrow in Figure 1. For example, from loss propagation to inner produce8, then to drop7... until convolution1.
  • the Backward For Inner Product process only includes backpropagation of the fully connected layer portion of Figure 1. That is, the process of spreading from loss to inner product6. Therefore, the Backward For Inner Product process is a back-propagation process of the fully connected layer related layer.
  • each backpropagation layer will calculate the gradient of the layer for the layer with parameters (some layers will not calculate the gradient because the layer has no parameters, such as loss Layer, drop layer, relu layer, only the inner product layer will calculate the gradient) and stored in this layer.
  • the gradient parameters calculated by this process will be very much, but the whole calculation process is very fast, which is the characteristic of this process.
  • the Backward For Convolution process of the primary and secondary cards The two cards behave the same way in the process, and the process is only the second half of Backward.
  • the Backward For Convolution process only contains the backpropagation of the convolutional layer in Figure 1. That is, the process of spreading from pool5 to convolution1. Therefore, the Backward For convolution process is the back propagation process of the convolutional layer.
  • each backpropagation layer will calculate the gradient of the layer for the parameterized layer (some layers will not calculate the gradient because the layer has no parameters, such as relu Layer, norm layer, pool layer, only the convolution layer will calculate the gradient) and stored in this layer.
  • the gradient parameters calculated by this process will be very, very small, but the entire calculation process will be very slow, which is characteristic of this process.
  • the first process of the main card and the slave card section Receive inner product Gradients and Send inner product Gradients: These two processes are the process of sending and receiving gradients. This process is a receive process on the primary card. That is, receiving the gradient calculated from the card, on the slave card is the send process, that is, the process of transmitting the calculated gradient. These two processes are followed by the Backward For Innerproduct process on their respective cards, indicating that the process must wait for Backward For Innerproduct before proceeding, but the process is in the communication queue and the calculation process is in the calculation queue. Medium, so it is executed concurrently with Backward For Convolution.
  • the second process of the main card and the slave card part Update Inner product Gradients process are the process of updating the full connection layer gradient. However, the behavior of the two cards is different.
  • the master card is the process of averaging the gradient of the full connection layer, and the process from the card is an empty process, that is, no action is performed. But the process is in the communication queue and is executed concurrently with Backward For Convolution.
  • the process is in the communication queue, it has a dependency on the Backward For Convolution of the compute queue, so this process must wait for the Backward For Convolution and Update Inner product Gradients to complete before executing.
  • Update Convolution Gradients process for master and slave cards These two processes are the process of averaging the convolutional gradients. But the behavior of the two cards is different.
  • the main card is the process of averaging the convolutional gradient, and the process from the card is an empty process, that is, no action is performed.
  • FIG. 8 a structural block diagram of an embodiment of a model training device for an Alexnet network according to the present application is shown, which may specifically include the following modules:
  • a first calculating module 801 configured to calculate, by using a first graphics processing unit GPU, a first gradient value and a second gradient value in an Alexnet-like network;
  • the first receiving module 802 is configured to receive a third gradient value that is sent by the second graphics processing unit GPU under the Alexnet-like network;
  • a second calculating module 803, configured to calculate a first model parameter of the Alexnet-like network according to the first gradient value and the third gradient value;
  • a second receiving module 804 configured to receive a fourth gradient value that is sent by the second graphics processing unit GPU under the Alexnet-like network
  • a third calculating module 805, configured to calculate a second model parameter of the Alexnet-like network according to the second gradient value and the fourth gradient value;
  • the training module 806 is configured to train the model of the Alexnet-like network by using the first model parameter and the second model parameter.
  • the Alexnet network may be composed of a full connection layer and a convolution layer
  • the first calculation module 801 may specifically include the following submodules:
  • a first calculation sub-module 8011 configured to use the first graphics processing unit GPU to calculate A first gradient value under the fully connected layer and a second gradient value under the convolutional layer.
  • the first graphics processing unit GPU may include a first computing queue
  • the first computing submodule 8011 may specifically include the following units:
  • the first calculating unit 8011A is configured to calculate, by using the first calculation queue, a first gradient value under the fully connected layer and a second gradient value under the convolution layer.
  • the first graphics processing unit GPU may further include a first communication queue
  • the second graphics processing unit GPU may include a second communication queue
  • the first receiving module 802 may specifically include The following submodules:
  • the first receiving submodule 8021 is configured to receive, by using the first communications queue, a third gradient value sent by the second communications queue;
  • the second receiving module 804 may specifically include the following submodules:
  • the second receiving sub-module 8041 is configured to receive, by using the first communications queue, a fourth gradient value sent by the second communications queue.
  • the second graphics processing unit may further include a second calculation queue, where the third gradient value and the fourth gradient value may be respectively obtained by using the following modules:
  • a fourth calculation module 807 configured to calculate, by using a second calculation queue, a third gradient value under the fully connected layer
  • the fifth calculating module 808 is configured to calculate a fourth gradient value under the convolution layer by using a second calculation queue.
  • the second calculating module 803 may specifically include the following submodules:
  • the first model parameter calculation sub-module 8031 is configured to calculate an average of the first gradient value and the third gradient value to obtain a first model parameter of the Alexnet-like network.
  • the third calculating module 805 may specifically include the following submodules:
  • the second model parameter calculation sub-module 8051 is configured to calculate an average of the second gradient value and the fourth gradient value to obtain a second model parameter.
  • the device may further include the following modules:
  • the determining module 808 is configured to determine whether the network is an Alexnet-like network.
  • the network may include m structural layers, and the determining module 808 may specifically include the following submodules:
  • Calculating a time and parameter quantity obtaining sub-module 8081 configured to pre-train the network to obtain a calculation time and a parameter quantity of each structural layer;
  • Calculating a total time and total parameter quantity obtaining sub-module 8082 configured to obtain a total calculation time and a total parameter quantity of the network according to the calculation time and the parameter quantity;
  • a sum of parameter quantities obtaining sub-module 8084 configured to accumulate a parameter quantity of the remaining mp layer when the ratio of the sum of the calculation time to the p-th layer and the calculated total time satisfies the first preset condition, to obtain the The sum of the parameter quantities of the remaining mp layers;
  • a determining sub-module 8085 configured to determine whether a ratio of a sum of parameter quantities of the remaining m-p layers and the total parameter quantity satisfies a second preset condition
  • the dividing sub-module 8086 is configured to divide the network into an Alexnet-like network when the second preset condition is met.
  • the dividing sub-module 8086 may specifically include the following units:
  • the all-connection layer dividing unit 8086A is configured to divide the front p layer of the network into a fully connected layer of an Alexnet-like network
  • a convolutional layer dividing unit 8086B is configured to divide the remaining m-p layer into a convolutional layer of an Alexnet-like network.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic randomness Access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, CD-ROM Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used for storage to be accessed by computing devices Information.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例提供了一种类Alexnet网络的模型训练方法和装置,所述方法包括:采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值;依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数;采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型,使计算和通信两个过程分开进行,进一步使类Alexnet网络的卷积层的计算和全连接参数通信并行,有效地减少了模型训练过程耗费的时间,提高了模型训练的运行效率。

Description

一种类Alexnet网络的模型训练方法和装置 技术领域
本申请涉及信息技术领域,特别是涉及一种类Alexnet网络的模型训练方法和一种类Alexnet网络的模型训练装置。
背景技术
人工智能(Artificial Intelligence)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。人工智能从诞生以来,理论和技术日益成熟,应用领域也不断扩大。近年来,深度学习(Deep Learning)直接尝试解决抽象认知的难题,并取得了突破性的进展。深度学习引爆的这场革命,将人工智能带上了一个新的台阶,不仅学术意义巨大,而且实用性很强。
深度学习的动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。通常,深度学习是通过建立相应的网络模型,采用模型训练的方式来进行的。根据不同的学习框架建立的学习模型很是不同,例如,卷积神经网络(Convolutional neural networks,简称CNNs)就是一种深度的监督学习下的机器学习模型,其中,Alexnet网络又是开发者经常使用的一种经典的卷积神经网络。
如图1所示,是一种Alexnet网络的结构示例图。在Alexnet网络中,比较重要的两种层类型为卷积层Convolution(即图1中Convolution1至pool5部分)和全连接层Inner Product(即图1中Inner Product6至loss层部分)。在Alexnet网络中进行一次模型训练的过程可以描述如下:
(1)先将数据从Data层正向传播到Top层,此过程途中先经过卷积层部分,再经过全连接层部分;
(2)在传播到Top层后计算损失;
(3)将损失从Top层依次反向传播到Data层,并在传播过程中计算梯度值,最后完成连接权重的更新,这一过程途中先经过全连接层部分,再经过卷积层部分。
在Alexnet网络中,无论是正向传播过程还是反向传播过程,卷积层部分都会拥有非常大的计算量,几乎占了整个网络的计算时间80%以上,但卷积层需要更新的参数量却非常小,只占整个网络参数的10%;而全连 接层部分的情况则与卷积层完全相反,全连接层部分拥有整个网络90%的待更新参数,但计算时间却只占了整个网络的20%。
在单机多卡(即一台装有多个图形处理单元GPU的计算机)环境下,在进行模型训练时,为了能够得到无损的训练结果,必须在每个GPU上都保持一份全量的模型,并在两个模型上同时进行训练。以两卡(两个图形处理单元GPU)为例,可以将两张卡分为主卡和从卡,如图2所示,是已有技术中主卡与从卡的工作原理图。在每一轮训练结束后,需要将从卡上的模型计算出来的梯度值发送到主卡模型上,并由主卡在计算梯度值的平均值后更新参数,最后将主卡上最新的模型广播发送到从卡上,才能继续进行下一次的训练。已有技术中一般是先对所有层计算出全部的梯度值之后,将得到的所有层的梯度值发送到主卡上求和平均并更新模型,即必须先进行全部的计算之后才能进行通信,在时间上,计算和通信具有严格的先后顺序。
因此,按照已有技术首先计算出全连接层的梯度值,并在将全连接层的梯度值汇总到主卡上之后,再计算卷积层的梯度值,则整个过程所耗费的时间将会非常多,严重影响模型训练的运行效率。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种类Alexnet网络的模型训练方法和相应的一种类Alexnet网络的模型训练装置。
为了解决上述问题,本申请公开了一种类Alexnet网络的模型训练方法,包括:
采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;
接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;
依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;
接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值;
依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模 型参数;
采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
可选地,所述类Alexnet网络由全连接层和卷积层组成,所述采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值的步骤包括:
采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
可选地,所述第一图形处理单元GPU包括第一计算队列,所述采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值的步骤包括:
采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
可选地,所述第一图形处理单元GPU还包括第一通信队列,所述第二图形处理单元GPU包括第二通信队列,所述接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值的步骤包括:
采用第一通信队列接收第二通信队列发送的第三梯度值;
所述接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值的步骤包括:
采用第一通信队列接收第二通信队列发送的第四梯度值。
可选地,所述第二图形处理单元还包括第二计算队列,所述第三梯度值和所述第四梯度值分别通过如下步骤获得:
采用第二计算队列计算在所述全连接层下的第三梯度值;以及,
采用第二计算队列计算在所述卷积层下的第四梯度值。
可选地,所述依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数的步骤包括:
计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数。
可选地,所述采用所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数的步骤包括:
计算所述第二梯度值和第四梯度值的平均值,获得所述类Alexnet网络的第二模型参数。
可选地,在采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值的步骤前,还包括:
判断网络是否为类Alexnet网络。
可选地,所述网络包括m个结构层,所述判断网络是否为类Alexnet网络的步骤包括:
对所述网络进行预训练,获得每个结构层的计算时间和参数量;
根据所述计算时间和参数量,获得所述网络的计算总时间和总参数量;
按照预设传输顺序,逐层累加所述m个结构层的计算时间,分别获得截至第n层的计算时间之和;
当所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;
判断所述剩余m-p层的参数量之和与所述总参数量的比值是否满足第二预设条件;
若是,则将所述网络划分为类Alexnet网络。
可选地,所述将所述网络划分为类Alexnet网络的步骤包括:
将所述网络的前p层划分为类Alexnet网络的全连接层;
将所述剩余m-p层划分为类Alexnet网络的卷积层。
为了解决上述问题,本申请还公开了一种类Alexnet网络的模型训练装置,包括:
第一计算模块,用于采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;
第一接收模块,用于接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;
第二计算模块,用于依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;
第二接收模块,用于接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值;
第三计算模块,用于依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数;
训练模块,用于采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
可选地,所述类Alexnet网络由全连接层和卷积层组成,所述第一计算模块包括:
第一计算子模块,用于采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
可选地,所述第一图形处理单元GPU包括第一计算队列,所述第一计算子模块包括:
第一计算单元,用于采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
可选地,所述第一图形处理单元GPU还包括第一通信队列,所述第二图形处理单元GPU包括第二通信队列,所述第一接收模块包括:
第一接收子模块,用于采用第一通信队列接收第二通信队列发送的第三梯度值;
所述第二接收模块包括:
第二接收子模块,用于采用第一通信队列接收第二通信队列发送的第四梯度值。
可选地,所述第二图形处理单元还包括第二计算队列,所述第三梯度值和所述第四梯度值分别通过如下模块获得:
第四计算模块,用于采用第二计算队列计算在所述全连接层下的第三梯度值;以及,
第五计算模块,用于采用第二计算队列计算在所述卷积层下的第四梯度值。
可选地,所述第二计算模块包括:
第一模型参数计算子模块,用于计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数。
可选地,所述第三计算模块包括:
第二模型参数计算子模块,用于计算所述第二梯度值和第四梯度值的平均值,获得第二模型参数。
可选地,所述装置还包括:
判断模块,用于判断网络是否为类Alexnet网络。
可选地,所述网络包括m个结构层,所述判断模块包括:
计算时间和参数量获得子模块,用于对所述网络进行预训练,获得每个结构层的计算时间和参数量;
计算总时间和总参数量获得子模块,用于根据所述计算时间和参数量,获得所述网络的计算总时间和总参数量;
计算时间之和获得子模块,用于按照预设传输顺序,逐层累加所述m个结构层的计算时间,分别获得截至第p层的计算时间之和;
参数量之和获得子模块,用于在所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;
判断子模块,用于判断所述剩余m-p层的参数量之和与所述总参数量的比值是否满足第二预设条件;
划分子模块,用于在满足第二预设条件时,将所述网络划分为类Alexnet网络。
可选地,所述划分子模块包括:
全连接层划分单元,用于将所述网络的前p层划分为类Alexnet网络的全连接层;
卷积层划分单元,用于将所述剩余m-p层划分为类Alexnet网络的卷积层。
与背景技术相比,本申请实施例包括以下优点:
本申请实施例通过分别在第一图形单元GPU(主卡)和第二图形单元GPU(从卡)上构建出相应的计算队列和通信队列,采用计算队列执行计算过程,采用通信队列进行数据通信,使计算和通信两个过程分开进行,并进一步使类Alexnet网络的卷积层的计算和全连接参数通信并行,有效地减少了模型训练过程耗费的时间,提高了模型训练的运行效率。
其次,在本申请实施例中,在采用某一网络进行模型训练前,还可以对所述网络进行预训练,通过对获得时间参数进行分析,以判断所述网络是否属于类Alexnet网络。
附图说明
图1是一种Alexnet网络的结构示例图;
图2是已有技术中主卡与从卡的工作原理图;
图3是本申请的一种类Alexnet网络的模型训练方法实施例一的步骤流程图;
图4是本申请的一种类Alexnet网络的模型训练方法实施例一的工作原理图;
图5是本申请的一种类Alexnet网络的模型训练方法实施例二的步骤流程图;
图6是本申请的一种类Alexnet网络的模型训练方法实施例二的数据反向传播计算时间和参数量统计图;
图7是本申请的判断网络是否为类Alexnet网络的算法流程图;
图8是本申请的一种类Alexnet网络的模型训练装置实施例的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图3,示出了本申请的一种类Alexnet网络的模型训练方法实施例一的步骤流程图,具体可以包括如下步骤:
步骤301,采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;
在Alexnet网络中,无论是正向传播过程还是反向传播过程,卷积层部分都会拥有非常大的计算量,几乎占了整个网络的计算时间80%以上,但卷积层需要更新的参数量却非常小,只占整个网络参数的10%;而全连接层部分的情况则与卷积层完全相反,全连接层部分拥有整个网络90%的待更新参数,但计算时间却只占了整个网络的20%。在本申请实施例中,可以将具有上述特点,并且在数据的正向传播过程中先经过卷积层部分,然后才经过全连接层部分的网络,称为类Alexnet网络。所述类Alexnet网络可以由全连接层和卷积层组成。
图形处理单元GPU(Graphics Processing Unit)又称显示核心、视觉处理器、显示芯片等,是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上进行图像运算工作的微处理器,常用于高性能计算,具有高并发处理数据的特性。在本申请实施例中,第一图形处理单元GPU可以看做是单机多卡环境下中主卡,第二图形处理单元GPU可以看做是单机多卡环境下中从卡。
在初始化时,主卡与从卡两张卡必须持有相同的网络结构,因此在Start之后,主卡需要将该卡的网络结构广播到从卡上,而从卡通过Receive Model过程接收网络结构,使得两张卡保持一致。然后两张卡开始执行相同的行为,目的是进行前向传播,计算Loss值。前向传播顾名思义就是从第一层向最后一层计算的过程。
具体地,前向传播的过程是按照图1中Alexnet网络中每层和每层之间的箭头所指方向进行的。例如从Data传播到Convolusion1,再到relu1...一直到最后的loss层。这时loss层会得出一个Loss值,该值被称为损失值,而后一过程反向传播能够进行的先决条件是需要得出Loss值。对于前向传播过程来说,先经过卷积层,后经过全连接层。
然后,进行后向传播,先经过全连接层,后经过卷积层,并相应地计算各层的梯度值。
梯度是一个数学概念,在处理分类问题或回归问题时,在模型训练的过程中,可以用损失值函数Loss来作为评估分类是否精准或者回归是否准确。一般情况下,训练得比较好的模型的损失值Loss都比较低,而所述Loss值又与神经网络的参数有关,如果所述参数符合应用场景的要求,那么Loss值就会比较低。如果将网络的所有模型参数组成w向量,可以得到Loss值是与w向量有关的,通常,好的w向量能够使Loss值降低。因此,问题可以归结为如何寻找到好的w向量?这需要进行训练,让模型自己去找。模型必须找到能够使得Loss值下降的正确的方向,而梯度这个数学量就是代表了Loss值下降的最快的方向。只要每次让w向量按照梯度这个方向更新一步,那么Loss值就会减少一些。这就是梯度的作用。
具体地,梯度的计算是根据Loss值关于各个w向量的偏微分求出来的,而求偏微分的过程就是在数据的反向传播的过程中进行的。
在本申请实施例中,所述第一梯度值即为全连接层梯度,所述第二梯度值即为卷积层梯度。
因此,所述采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值的步骤具体可以包括如下子步骤:
子步骤3011,采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
通常,在GPU中可以有多个不同的操作队列,即CUDA流,并且该队列中的操作可以按照添加到队列的先后顺序执行,不同流间的操作可以并行执行。CUDA是一种由NVIDIA推出的通用并行计算架构,该架构使GPU能够解决复杂的计算问题,并使得在计算机上实现GPU编程成为可能。
在本申请实施例中,所述第一图形处理单元GPU即主卡上可以包括有第一计算队列和第一通信队列,所述第二图形处理单元GPU即从卡上可以包括有第二计算队列和第二通信队列,所述第一计算队列、第一通信队列、第二计算队列和第二通信队列均是CUDA流,其中,第一计算队列和第二计算队列可以用于计算,而第一通信队列和第二通信队列可以用于通信,以使主卡和从卡的计算和通信分开,实现并行处理。
在本申请的一种优选实施例中,所述采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值的子步骤可以进一步包括:
采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
在具体实现中,在计算全连接层下的第一梯度值时,该过程只是反向传播的前半部分。反向传播的完整过程是从loss层开始(最后一层),按照箭头相反的方向逐层传播。例如从loss层传播到inner produce8,再到drop7...,一直到convolution1。而计算第一梯度值的过程只是包含图中从loss层传播到inner product6的过程。在全连接层相关层的反向传播的过程中,每反向传播一层就会对有参数的层计算出该层的梯度值(有些层不会计算出梯度,因为该层并没有参数,比如loss层,drop层,relu层,只有inner product层才会计算出梯度)并存储在该层中。这一过程计算出来的梯度参数会非常非常多,但是整个计算过程却非常迅速,这是这一过程的特点。
在计算卷积层的第二梯度值时,该过程只是反向传播的后半部分,即从pool5传播到convolution1的过程。在卷积层相关层的反向传播的过程中,每反向传播一层就会对有参数的层会计算出该层的梯度(有些层不会计算出梯度,因为该层并没有参数,比如relu层,norm层,pool层,只有convolution层才会计算出梯度)并存储在该层中。这一过程计算出来的梯度参数会非常非常少,但是整个计算过程会非常慢,这是这一过程的特点。
步骤302,接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;
在本申请实施例中,所述第三梯度值具体可以通过如下步骤获得:
采用第二计算队列计算在所述全连接层下的第三梯度值。
在具体实现中,所述第二图形处理单元GPU即从卡上的第二计算队列计算所述全连接层下的第三梯度值的过程,与主卡上的第一计算队列计算第一梯度值同时并行进行。
在本申请实施例中,所述接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值的步骤具体可以包括如下子步骤:
子步骤3021,采用第一通信队列接收第二通信队列发送的第三梯度值。
在本申请实施例中,为了将主卡与从卡的计算与通信过程区分开,可以采用计算队列执行相应的计算过程,采用通信队列执行相应的数据发送与接收,因此,可以采用第一通信队列来接收第二通信队列发送的第三梯度值。
在具体实现中,可以利用流并行的方式,将卷积层的计算和全连接层 的参数通信并行执行,即在主卡采用第一计算队列计算所述第二梯度值时,采用第一通信队列接收第二通信队列发送的第三梯度值,使计算和通信的过程获得时间上的重叠,二者互不干扰。
步骤303,依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;
所述第一梯度值与第三梯度值分别是主卡和从卡计算的所述类Alexnet网络的全连接层的梯度,因此,当从卡的数据汇总到主卡上后,需要根据主卡和从卡的数据对全连接层的数据进行更新。
在本申请的一种优选实施例中,所述依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数的步骤具体可以包括如下子步骤:
子步骤3031,计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数。
所述第一模型参数即为更新后的全连接层梯度。
步骤304,接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值;
在本申请实施例中,所述第四梯度值具体可以通过如下步骤获得:
采用第二计算队列计算在所述卷积层下的第四梯度值。
在具体实现中,所述第二图形处理单元GPU即从卡上的第二计算队列计算所述卷积层下的第四梯度值的过程,与主卡上的第一计算队列计算第二梯度值同时并行进行。
在本申请实施例中,所述接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值的步骤具体可以包括如下子步骤:
子步骤3041,采用第一通信队列接收第二通信队列发送的第四梯度值。
在本申请实施例中,为了将主卡与从卡的计算与通信过程区分开,可以采用计算队列执行相应的计算过程,采用通信队列执行相应的数据发送与接收,因此,可以采用第一通信队列来接收第二通信队列发送的第四梯度值。
步骤305,依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数;
所述第二梯度值与第四梯度值分别是主卡和从卡计算的所述类Alexnet网络的卷积层的梯度,因此,当从卡的数据汇总到主卡上后,需要根据主卡和从卡的数据对卷积层的数据进行更新。
在本申请的一种优选实施例中,所述依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数的步骤具体可以包括如下子步骤:
子步骤3051,计算所述第二梯度值和第四梯度值的平均值,获得所述类Alexnet网络的第二模型参数。
所述第二模型参数即为更新后的卷积层梯度。
步骤306,采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
在本申请实施例中,当分别获得所述第一模型参数和第二模型参数后,主卡可以针对所述第一模型参数和第二模型参数对自身的模型参数进行更新,以获得新的训练模型。
在已有技术中,在Alexnet网络下进行模型训练时,每一轮的计算均为依次进行发送/接收模型、Forward前向传播、Backward反向传播,接收/发送梯度值、参数更新。假设Forward前向传播的过程时间为a,Backward反向传播的过程中涉及到全连接层的计算时间为b,涉及到卷积层的计算时间为c,发送/接收全连接层梯度值的时间为m,发送/接收卷积层梯度值的时间为n,那么按照已有技术完成整个过程的总时间T1为:
T1=a+b+c+m+n,其中c>>b,m>>n
而采用本申请实施例的方法,利用流并行方式将反向传播过程中,卷积层的计算和全连接参数通信并行起来后,总时间T2为:
T2=a+b+max(c,m)+n
由于T1-T2=c+m-max(c,m)>0,所以:T1>T2
由上式可知,利用流并行方式将通信和计算并行来优化类Alexnet网络的方案可以有效减少整个过程耗费的时间。
在本申请实施例中,通过分别在第一图形单元GPU(主卡)和第二图形单元GPU(从卡)上构建出相应的计算队列和通信队列,采用计算队列执行计算过程,采用通信队列进行数据通信,使计算和通信两个过程分开进行,并进一步使类Alexnet网络的卷积层的计算和全连接参数通信并行,有效地减少了模型训练过程耗费的时间,提高了模型训练的运行效率。
参照图5,示出了本申请的一种类Alexnet网络的模型训练方法实施例二的步骤流程图,具体可以包括如下步骤:
步骤501,判断网络是否为类Alexnet网络;
通常,如果将网络一般化并分为两部分,分别称为M和N,且在反向 传播时先进行M部分计算,后进行N部分计算,那么如果M部分的计算量占据整个时间的很小一部分,而所有用的参数占据所有参数量的很大一部分,并且N部分具有和M相反的特性,那么可以将称此种网络看作为类Alexnet网络。
在本申请实施例中,对于包括有m个结构层的某一网络,可以首先判断该网络是否为类Alexnet网络。
在本申请的一种优选实施例中,所述判断网络是否为类Alexnet网络的步骤具体可以包括如下子步骤:
子步骤5011,对所述网络进行预训练,获得每个结构层的计算时间和参数量;
子步骤5012,根据所述计算时间和参数量,获得所述网络的计算总时间和总参数量;
子步骤5013,按照预设传输顺序,逐层累加所述m个结构层的计算时间,分别获得截至第p层的计算时间之和;
子步骤5014,当所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;
子步骤5015,判断所述剩余m-p层的参数量之和与所述总参数量的比值是否满足第二预设条件;
在本申请实施例中,可以利用计算机去判断当前网络是否属于类Alexnet网络,即可以通过预训练获取到的时间参数进行分析。
首先,对所述网络进行预训练,可以获得每个结构层的计算时间和参数量;然后根据所述计算时间和参数量,能够获得所述网络的计算总时间和总参数量;按照预设传输顺序(一般地,所述预设传输顺序可以是反向传输方向,即从所述网络的最后一层传输至第一层的过程),逐层累加所述m个结构层的计算时间,分别获得截至第p层的计算时间之和;当所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;最后判断所述剩余m-n层的参数量之和与所述总参数量的比值是否满足第二预设条件。
通常,由于Alexnet网络的特点在于计算量大的部分参数量小,而计算量小的部分参数量却很大,因此,本领域技术人员可以据此设置第一预设条件和第二预设条件的具体数值,本申请对所述数值不作具体限定。
具体地,可以把预训练的网络分为两个部分,即M部分和N部分, 那么问题就可以划归为如何选取分割M和N的分界点。进一步地,所述分界点的选取过程可以按照如下方式进行:将该网络在训练前进行若干次预训练过程,并计算每次运行时每个层进行反向传播时的计算时间和拥有的参数量。然后,将所有层的反向传播时间累加和记为T,所有层所拥有的参数量记为V,以反向传播的起始层为起点,不断累加下一层的计算时间,并记为t。当t/T>0.1时停止累加,并将当前层记为第p层。将从第p层到反向传播的最后一层所拥有的参数量总和记为v,若此时v/V<0.1,那么可以认为此网络类型为类Alexnet网络,可以继续执行子步骤5016,若v/V>0.1,则可以认为此网络并不是类Alexnet网络。
子步骤5016,将所述网络划分为类Alexnet网络。
在本申请的一种优选实施例中,所述将所述网络划分为类Alexnet网络的子步骤可以进一步包括:
将所述网络的前p层划分为类Alexnet网络的全连接层;
将所述剩余m-p层划分为类Alexnet网络的卷积层。
当所述网络被确认为属于类Alexnet网络时,可以根据获得的分解点,即子步骤5011-5015中获得的第p层,将所述网络具体划分为全连接层部分和卷积层部分。
参照图6,是本申请的一种的数据反向传播计算时间和参数量统计图,图中横坐标为层数,左部分为反向传播每一层的计算时间统计,纵坐标单位为ms,右部分为每层所拥有的的参数量,单位为字节。从反向传播的起始层开始到上文计算出来的第p层之间的所有层计算时间之和为整个反向传播总时间的10%,而此时如果从第p层到反向传播的最后一层之间的所有层所拥有的参数量之和占据总参数量的10%左右,从而可以说明从反向传播的第一层到第p层具有90%的参数量。进而可以认定,从反向传播的最后一层到第p层的部分为M,即全连接层,而从第p层到反向传播的第一层的部分为N,即卷积层部分。上述判断过程可以通过如图7所示的算法流程图表示。
步骤502,采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值;
在本申请实施例中,可以分别在第一图形处理单元GPU即主卡,和第二图形处理单元GPU即从卡上分别构建出计算队列和通信队列。具体地,可以在主卡上构建第一计算队列和第一通信队列,在从卡上构建第二计算队列和第二通信队列,由第一计算队列和第二计算队列执行相应的计算过程,由第一通信队列和第二通信队列执行相应的通信过程,以获得时 间上的叠加。
因此,在本申请实施例中,可以采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
在具体实现中,在计算全连接层下的第一梯度值时,该过程只是反向传播的前半部分。反向传播的完整过程是从loss层开始(最后一层),按照箭头相反的方向逐层传播。例如从loss层传播到inner produce8,再到drop7...,一直到convolution1。而计算第一梯度值的过程只是包含图中从loss层传播到inner product6的过程。在全连接层相关层的反向传播的过程中,每反向传播一层就会对有参数的层计算出该层的梯度值(有些层不会计算出梯度,因为该层并没有参数,比如loss层,drop层,relu层,只有inner product层才会计算出梯度)并存储在该层中。这一过程计算出来的梯度参数会非常非常多,但是整个计算过程却非常迅速,这是这一过程的特点。
在计算卷积层的第二梯度值时,该过程只是反向传播的后半部分,即从pool5传播到convolution1的过程。在卷积层相关层的反向传播的过程中,每反向传播一层就会对有参数的层会计算出该层的梯度(有些层不会计算出梯度,因为该层并没有参数,比如relu层,norm层,pool层,只有convolution层才会计算出梯度)并存储在该层中。这一过程计算出来的梯度参数会非常非常少,但是整个计算过程会非常慢,这是这一过程的特点。
步骤503,采用第一通信队列接收第二通信队列发送的第三梯度值;
在本申请实施例中,所述第三梯度值可以是通过第二图形处理单元GPU即从卡上的第二计算队列计算获得的,计算第三梯度值的过程,与主卡上的第一计算队列计算第一梯度值同时并行进行。
在本申请实施例中,由于主卡与从卡的计算与通信过程分开进行,计算队列执行相应的计算过程,通信队列执行相应的数据发送与接收,因此,可以采用第一通信队列来接收第二通信队列发送的第三梯度值。
在具体实现中,可以利用流并行的方式,将卷积层的计算和全连接层的参数通信并行执行,即在主卡采用第一计算队列计算所述第二梯度值时,采用第一通信队列接收第二通信队列发送的第三梯度值,使计算和通信的过程获得时间上的重叠,二者互不干扰。
步骤504,计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数;
在本申请实施例中,所述第一梯度值与第三梯度值分别是主卡和从卡计算的所述类Alexnet网络的全连接层的梯度,因此,当从卡的数据汇总 到主卡上后,需要根据主卡和从卡的数据对全连接层的数据进行更新。在具体实现中,可以将所述第一梯度值和第三梯度值相加求平均值,以获得第一模型参数,所述第一模型参数即为更新后的全连接层梯度。
步骤505,采用第一通信队列接收第二通信队列发送的第四梯度值;
在本申请实施例中,所述第四梯度值可以采用第二计算队列计算获得,然后可以采用第二通信队列将第四梯度值发送至主卡。
在具体实现中,所述第二图形处理单元GPU即从卡上的第二计算队列计算所述卷积层下的第四梯度值的过程,与主卡上的第一计算队列计算第二梯度值同时并行进行。
步骤506,计算所述第二梯度值和第四梯度值的平均值,获得所述类Alexnet网络的第二模型参数;
在具体实现中,可以将所述第二梯度值和第四梯度值相加求平均值,以获得第二模型参数,所述第二模型参数即为更新后的卷积层梯度。
步骤507,采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
在本申请实施例中,在采用某一网络进行模型训练前,可以对所述网络进行预训练,通过对获得时间参数进行分析,以判断所述网络是否属于类Alexnet网络。
为了便于理解,请参照图4,下面以一个完整的示例对本申请实施例的主卡与从卡的计算和通信过程作一说明:
1、主卡的BroadCast Model和从卡的Receive Model:主卡的BroadCast Model是将主卡的模型发送到从卡上,而从卡的Receive Model是负责接收主卡的模型。这一过程是为了让两张卡都保留有相同的Alexnet网络结构。之所以要将梯度发送到主卡上再更新,是因为在模型训练时只会更新主卡的模型,而不管从卡。因为在第二轮Forward开始前,需要将主卡的模型广播到从卡上。因此,可以始终以主卡模型为基础。
2、主卡的Forward过程和从卡的Forward过程:该过程两张卡的行为相同,属于前向传播,是按照图1中Alexnet网络中每层和每层之间的箭头所指方向进行。例如从Data传播到Convolusion1,再到relu1...,一直到最后的loss层。这时loss层会得出一个loss值,该值被称为损失值,而后一过程Backward(反向传播)能够进行的先决条件是需要得出loss值,因此必须先进行Forward,后进行Backward。对于Forward过程来说,先经过的是卷积层相关层,后经过的是全连接层相关层。每一层的计算方式均 有些区别,这是由于各层的计算公式不同所致。
3、主卡和从卡的Backward For Inner Product过程:该过程两张卡的行为相同,该过程只是Backward完整的前半部分。Backward完整过程是从loss层开始(即最后一层),按照图1中箭头相反的方向逐层传播。例如从loss传播到inner produce8,再到drop7...,一直到convolution1。而Backward For Inner Product过程只是包含图1中全连接层部分的反向传播。即从loss传播到inner product6的过程。所以,Backward For Inner Product这一过程是全连接层相关层的反向传播过程。在全连接层相关层的反向传播的过程中,每反向传播一层就会对有参数的层会计算出该层的梯度(有些层不会计算出梯度,因为该层并没有参数,比如loss层,drop层,relu层,只有inner product层才会计算出梯度)并存储在该层中。这一过程计算出来的梯度参数会非常非常多,但是整个计算过程却非常迅速,这是这一过程的特点。
4、主卡和从卡的Backward For Convolution过程:该过程两张卡的行为相同,该过程只是Backward完整的后半部分。而Backward For Convolution过程只是包含图1中卷积层部分的反向传播。即从pool5传播到convolution1的过程。所以,Backward For convolution这一过程是卷积层相关层的反向传播过程。在卷积层相关层的反向传播的过程中,每反向传播一层就会对有参数的层会计算出该层的梯度(有些层不会计算出梯度,因为该层并没有参数,比如relu层,norm层,pool层,只有convolution层才会计算出梯度)并存储在该层中。这一过程计算出来的梯度参数会非常非常少,但是整个计算过程会非常慢,这是这一过程的特点。
5、主卡和从卡部分的第一个过程Receive inner product Gradients和Send inner product Gradients:这两个过程是梯度的发送与接收过程。该过程在主卡上是receive过程。即接收从卡计算出来的梯度,在从卡上是send过程,即发送计算的梯度的过程。这两个过程在各自的卡上都是跟在Backward For Innerproduct过程之后,代表该过程必须等待Backward For Innerproduct之后才能进行,但该过程是处在通信队列中的过程,而计算过程处在计算队列中,所以其与Backward For Convolution同时并行执行。
6、主卡和从卡部分的第二个过程Update Inner product Gradients过程:这两个过程是更新全连接层梯度的过程。但是两张卡的行为不一样,主卡是对全连接层梯度取平均的过程,而从卡这个过程是个空过程,即不执行任何行为。但该过程是处在通信队列中的过程,与Backward For Convolution同时并行执行
7、主卡和从卡部分的第三个过程Receive Convolution Gradients和Send Convolution Gradients过程:即主卡接收从卡发送的卷积层梯度参数,而从卡向主卡发送卷积层梯度参数的过程。虽然该过程处在通信队列中,但是它与计算队列的Backward For Convolution有依赖关系,因此这一过程必须要等待Backward For Convolution和Update Inner product Gradients完成之后才能执行。
8、主卡和从卡的Update Convolution Gradients过程:这两个过程是对卷积层梯度取平均的过程。但是两张卡的行为不一样,主卡是对卷积层梯度取平均的过程,而从卡这个过程是个空过程,即不执行任何行为。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图8,示出了本申请的一种类Alexnet网络的模型训练装置实施例的结构框图,具体可以包括如下模块:
第一计算模块801,用于采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;
第一接收模块802,用于接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;
第二计算模块803,用于依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;
第二接收模块804,用于接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值;
第三计算模块805,用于依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数;
训练模块806,用于采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
在本申请实施例中,所述类Alexnet网络可以由全连接层和卷积层组成,所述第一计算模块801具体可以包括如下子模块:
第一计算子模块8011,用于采用第一图形处理单元GPU计算在所述 全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
在本申请实施例中,所述第一图形处理单元GPU可以包括有第一计算队列,所述第一计算子模块8011具体可以包括如下单元:
第一计算单元8011A,用于采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
在本申请实施例中,所述第一图形处理单元GPU还可以包括有第一通信队列,所述第二图形处理单元GPU可以包括有第二通信队列,所述第一接收模块802具体可以包括如下子模块:
第一接收子模块8021,用于采用第一通信队列接收第二通信队列发送的第三梯度值;
所述第二接收模块804具体可以包括如下子模块:
第二接收子模块8041,用于采用第一通信队列接收第二通信队列发送的第四梯度值。
在本申请实施例中,所述第二图形处理单元还可以包括有第二计算队列,所述第三梯度值和所述第四梯度值可以分别通过如下模块获得:
第四计算模块807,用于采用第二计算队列计算在所述全连接层下的第三梯度值;以及,
第五计算模块808,用于采用第二计算队列计算在所述卷积层下的第四梯度值。
在本申请实施例中,所述第二计算模块803具体可以包括如下子模块:
第一模型参数计算子模块8031,用于计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数。
在本申请实施例中,所述第三计算模块805具体可以包括如下子模块:
第二模型参数计算子模块8051,用于计算所述第二梯度值和第四梯度值的平均值,获得第二模型参数。
在本申请实施例中,所述装置还可以包括如下模块:
判断模块808,用于判断网络是否为类Alexnet网络。
在本申请实施例中,所述网络可以包括有m个结构层,所述判断模块808具体可以包括如下子模块:
计算时间和参数量获得子模块8081,用于对所述网络进行预训练,获得每个结构层的计算时间和参数量;
计算总时间和总参数量获得子模块8082,用于根据所述计算时间和参数量,获得所述网络的计算总时间和总参数量;
计算时间之和获得子模块8083,用于按照预设传输顺序,逐层累加所 述m个结构层的计算时间,分别获得截至第p层的计算时间之和;
参数量之和获得子模块8084,用于在所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;
判断子模块8085,用于判断所述剩余m-p层的参数量之和与所述总参数量的比值是否满足第二预设条件;
划分子模块8086,用于在满足第二预设条件时,将所述网络划分为类Alexnet网络。
在本申请实施例中,所述划分子模块8086具体可以包括如下单元:
全连接层划分单元8086A,用于将所述网络的前p层划分为类Alexnet网络的全连接层;
卷积层划分单元8086B,用于将所述剩余m-p层划分为类Alexnet网络的卷积层。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机 存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、 方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种类Alexnet网络的模型训练方法和一种类Alexnet网络的模型训练装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种类Alexnet网络的模型训练方法,其特征在于,包括:
    采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;
    接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;
    依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;
    接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值;
    依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数;
    采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
  2. 根据权利要求1所述的方法,其特征在于,所述类Alexnet网络由全连接层和卷积层组成,所述采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值的步骤包括:
    采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
  3. 根据权利要求2所述的方法,其特征在于,所述第一图形处理单元GPU包括第一计算队列,所述采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值的步骤包括:
    采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
  4. 根据权利要求3所述的方法,其特征在于,所述第一图形处理单 元GPU还包括第一通信队列,所述第二图形处理单元GPU包括第二通信队列,所述接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值的步骤包括:
    采用第一通信队列接收第二通信队列发送的第三梯度值;
    所述接收所述第二图形处理单元GPU发送的在所述类Alexnet网络下的第四梯度值的步骤包括:
    采用第一通信队列接收第二通信队列发送的第四梯度值。
  5. 根据权利要求4所述的方法,其特征在于,所述第二图形处理单元还包括第二计算队列,所述第三梯度值和所述第四梯度值分别通过如下步骤获得:
    采用第二计算队列计算在所述全连接层下的第三梯度值;以及,
    采用第二计算队列计算在所述卷积层下的第四梯度值。
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数的步骤包括:
    计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数。
  7. 根据权利要求6所述的方法,其特征在于,所述采用所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数的步骤包括:
    计算所述第二梯度值和第四梯度值的平均值,获得所述类Alexnet网络的第二模型参数。
  8. 根据权利要求1或2或3或4或5或7所述的方法,其特征在于,在采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值的步骤前,还包括:
    判断网络是否为类Alexnet网络。
  9. 根据权利要求8所述的方法,其特征在于,所述网络包括m个结构层,所述判断网络是否为类Alexnet网络的步骤包括:
    对所述网络进行预训练,获得每个结构层的计算时间和参数量;
    根据所述计算时间和参数量,获得所述网络的计算总时间和总参数量;
    按照预设传输顺序,逐层累加所述m个结构层的计算时间,分别获得截至第n层的计算时间之和;
    当所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;
    判断所述剩余m-p层的参数量之和与所述总参数量的比值是否满足第二预设条件;
    若是,则将所述网络划分为类Alexnet网络。
  10. 根据权利要求9所述的方法,其特征在于,所述将所述网络划分为类Alexnet网络的步骤包括:
    将所述网络的前p层划分为类Alexnet网络的全连接层;
    将所述剩余m-p层划分为类Alexnet网络的卷积层。
  11. 一种类Alexnet网络的模型训练装置,其特征在于,包括:
    第一计算模块,用于采用第一图形处理单元GPU计算在类Alexnet网络下的第一梯度值和第二梯度值;
    第一接收模块,用于接收第二图形处理单元GPU发送的在所述类Alexnet网络下的第三梯度值;
    第二计算模块,用于依据所述第一梯度值和第三梯度值计算所述类Alexnet网络的第一模型参数;
    第二接收模块,用于接收所述第二图形处理单元GPU发送的在所述 类Alexnet网络下的第四梯度值;
    第三计算模块,用于依据所述第二梯度值和第四梯度值计算所述类Alexnet网络的第二模型参数;
    训练模块,用于采用所述第一模型参数和第二模型参数训练所述类Alexnet网络的模型。
  12. 根据权利要求11所述的装置,其特征在于,所述类Alexnet网络由全连接层和卷积层组成,所述第一计算模块包括:
    第一计算子模块,用于采用第一图形处理单元GPU计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
  13. 根据权利要求12所述的装置,其特征在于,所述第一图形处理单元GPU包括第一计算队列,所述第一计算子模块包括:
    第一计算单元,用于采用第一计算队列计算在所述全连接层下的第一梯度值和在所述卷积层下的第二梯度值。
  14. 根据权利要求13所述的装置,其特征在于,所述第一图形处理单元GPU还包括第一通信队列,所述第二图形处理单元GPU包括第二通信队列,所述第一接收模块包括:
    第一接收子模块,用于采用第一通信队列接收第二通信队列发送的第三梯度值;
    所述第二接收模块包括:
    第二接收子模块,用于采用第一通信队列接收第二通信队列发送的第四梯度值。
  15. 根据权利要求14所述的装置,其特征在于,所述第二图形处理单元还包括第二计算队列,所述第三梯度值和所述第四梯度值分别通过如下模块获得:
    第四计算模块,用于采用第二计算队列计算在所述全连接层下的第三 梯度值;以及,
    第五计算模块,用于采用第二计算队列计算在所述卷积层下的第四梯度值。
  16. 根据权利要求11-15任一所述的装置,其特征在于,所述第二计算模块包括:
    第一模型参数计算子模块,用于计算所述第一梯度值和第三梯度值的平均值,获得所述类Alexnet网络的第一模型参数。
  17. 根据权利要求16所述的装置,其特征在于,所述第三计算模块包括:
    第二模型参数计算子模块,用于计算所述第二梯度值和第四梯度值的平均值,获得第二模型参数。
  18. 根据权利要求11或12或13或14或15或17所述的装置,其特征在于,所述装置还包括:
    判断模块,用于判断网络是否为类Alexnet网络。
  19. 根据权利要求18所述的装置,其特征在于,所述网络包括m个结构层,所述判断模块包括:
    计算时间和参数量获得子模块,用于对所述网络进行预训练,获得每个结构层的计算时间和参数量;
    计算总时间和总参数量获得子模块,用于根据所述计算时间和参数量,获得所述网络的计算总时间和总参数量;
    计算时间之和获得子模块,用于按照预设传输顺序,逐层累加所述m个结构层的计算时间,分别获得截至第p层的计算时间之和;
    参数量之和获得子模块,用于在所述截至第p层的计算时间之和与所述计算总时间的比值满足第一预设条件时,累加剩余m-p层的参数量,获得所述剩余m-p层的参数量之和;
    判断子模块,用于判断所述剩余m-p层的参数量之和与所述总参数量的比值是否满足第二预设条件;
    划分子模块,用于在满足第二预设条件时,将所述网络划分为类Alexnet网络。
  20. 根据权利要求19所述的装置,其特征在于,所述划分子模块包括:
    全连接层划分单元,用于将所述网络的前p层划分为类Alexnet网络的全连接层;
    卷积层划分单元,用于将所述剩余m-p层划分为类Alexnet网络的卷积层。
PCT/CN2017/077897 2016-03-31 2017-03-23 一种类Alexnet网络的模型训练方法和装置 WO2017167114A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610201731.1A CN107292385A (zh) 2016-03-31 2016-03-31 一种类Alexnet网络的模型训练方法和装置
CN201610201731.1 2016-03-31

Publications (1)

Publication Number Publication Date
WO2017167114A1 true WO2017167114A1 (zh) 2017-10-05

Family

ID=59962574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077897 WO2017167114A1 (zh) 2016-03-31 2017-03-23 一种类Alexnet网络的模型训练方法和装置

Country Status (3)

Country Link
CN (1) CN107292385A (zh)
TW (1) TW201737202A (zh)
WO (1) WO2017167114A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799834A (zh) * 2021-01-26 2021-05-14 北京迈格威科技有限公司 训练数据分发方法、装置、电子设备及存储介质
CN112949446A (zh) * 2021-02-25 2021-06-11 山东英信计算机技术有限公司 一种物体识别方法、装置、设备及介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11574193B2 (en) * 2018-04-28 2023-02-07 Samsung Electronics Co., Ltd. Method and system for training of neural networks using continuously differentiable models
WO2020147142A1 (zh) * 2019-01-16 2020-07-23 华为技术有限公司 一种深度学习模型的训练方法、系统
CN110059813B (zh) * 2019-02-13 2021-04-06 创新先进技术有限公司 利用gpu集群更新卷积神经网络的方法、装置及设备
CN111709513B (zh) * 2019-03-18 2023-06-09 百度在线网络技术(北京)有限公司 长短期记忆网络lstm的训练系统、方法及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036451A (zh) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 基于多图形处理器的模型并行处理方法及装置
CN104035751A (zh) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 基于多图形处理器的数据并行处理方法及装置
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法
US20150161522A1 (en) * 2013-12-06 2015-06-11 International Business Machines Corporation Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7219085B2 (en) * 2003-12-09 2007-05-15 Microsoft Corporation System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
US7747070B2 (en) * 2005-08-31 2010-06-29 Microsoft Corporation Training convolutional neural networks on graphics processing units
CN101976207A (zh) * 2010-07-29 2011-02-16 西安交通大学 一种面向gpu的数据流处理方法
ITRM20120094A1 (it) * 2012-03-14 2013-09-14 Istituto Naz Di Fisica Nuclea Re Scheda di interfaccia di rete per nodo di rete di calcolo parallelo su gpu, e relativo metodo di comunicazione internodale
CN103996069B (zh) * 2013-02-20 2018-04-03 百度在线网络技术(北京)有限公司 一种基于多gpu的bpnn训练方法和装置
CN103150596B (zh) * 2013-02-22 2015-12-23 百度在线网络技术(北京)有限公司 一种反向传播神经网络dnn的训练系统
CN103226540B (zh) * 2013-05-21 2015-08-19 中国人民解放军国防科学技术大学 基于分组多流的gpu上多区结构网格cfd加速方法
CN104143327B (zh) * 2013-07-10 2015-12-09 腾讯科技(深圳)有限公司 一种声学模型训练方法和装置
CN103680496B (zh) * 2013-12-19 2016-08-10 百度在线网络技术(北京)有限公司 基于深层神经网络的声学模型训练方法、主机和系统
CN104809426B (zh) * 2014-01-27 2019-04-05 日本电气株式会社 卷积神经网络的训练方法、目标识别方法及装置
CN104899641B (zh) * 2015-05-25 2018-07-13 杭州朗和科技有限公司 深度神经网络学习方法、处理器和深度神经网络学习系统
CN104933463B (zh) * 2015-07-07 2018-01-23 杭州朗和科技有限公司 深度神经网络模型的训练方法和设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150161522A1 (en) * 2013-12-06 2015-06-11 International Business Machines Corporation Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition
CN104036451A (zh) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 基于多图形处理器的模型并行处理方法及装置
CN104035751A (zh) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 基于多图形处理器的数据并行处理方法及装置
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799834A (zh) * 2021-01-26 2021-05-14 北京迈格威科技有限公司 训练数据分发方法、装置、电子设备及存储介质
CN112799834B (zh) * 2021-01-26 2024-05-07 北京迈格威科技有限公司 训练数据分发方法、装置、电子设备及存储介质
CN112949446A (zh) * 2021-02-25 2021-06-11 山东英信计算机技术有限公司 一种物体识别方法、装置、设备及介质

Also Published As

Publication number Publication date
TW201737202A (zh) 2017-10-16
CN107292385A (zh) 2017-10-24

Similar Documents

Publication Publication Date Title
WO2017167114A1 (zh) 一种类Alexnet网络的模型训练方法和装置
US20210295161A1 (en) Training neural networks represented as computational graphs
US11568258B2 (en) Operation method
EP4036724A1 (en) Method for splitting neural network model by using multi-core processor, and related product
Peng et al. More trainable inception-ResNet for face recognition
US11461626B2 (en) Brain-like computing chip and computing device
EP3451242A1 (en) Device and method for performing reversetraining of fully connected layers of neural network
TW201824095A (zh) 用於稀疏神經網路加速的架構
CN111176758B (zh) 配置参数的推荐方法、装置、终端及存储介质
CN110930996B (zh) 模型训练方法、语音识别方法、装置、存储介质及设备
US20220004858A1 (en) Method for processing artificial neural network, and electronic device therefor
CN115357554B (zh) 一种图神经网络压缩方法、装置、电子设备及存储介质
US20230162034A1 (en) Method and apparatus with neural network data input and output control
CN109324901A (zh) 基于区块链的深度学习分布式计算方法、系统和节点
CN114648103A (zh) 用于处理深度学习网络的自动多目标硬件优化
Dumachev On semideterministic finite automata games type
CN116934571A (zh) 任务处理方法、装置、电子设备和存储介质
CN116975686A (zh) 训练学生模型的方法、行为预测方法和装置
JP7412489B2 (ja) 連合学習方法及び装置、電子機器、記憶媒体ならびにコンピュータプログラム
CN115687764A (zh) 车辆轨迹评估模型的训练方法、车辆轨迹评估方法和装置
CN115544307A (zh) 基于关联矩阵的有向图数据特征提取与表达方法和系统
US11231961B2 (en) Scheduling operations
Sun et al. Parallel factorization machine recommended algorithm based on mapreduce
EP4198837A1 (en) Method and system for global explainability of neural networks
CN111027018B (zh) 加速计算设备建模的方法、装置、计算设备及介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773144

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17773144

Country of ref document: EP

Kind code of ref document: A1