CN104463324A - Convolution neural network parallel processing method based on large-scale high-performance cluster - Google Patents

Convolution neural network parallel processing method based on large-scale high-performance cluster Download PDF

Info

Publication number
CN104463324A
CN104463324A CN201410674860.3A CN201410674860A CN104463324A CN 104463324 A CN104463324 A CN 104463324A CN 201410674860 A CN201410674860 A CN 201410674860A CN 104463324 A CN104463324 A CN 104463324A
Authority
CN
China
Prior art keywords
node
training
model
parameter
model parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410674860.3A
Other languages
Chinese (zh)
Inventor
王馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHANGSHA MASHA ELECTRONIC TECHNOLOGY Co Ltd
Original Assignee
CHANGSHA MASHA ELECTRONIC TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHANGSHA MASHA ELECTRONIC TECHNOLOGY Co Ltd filed Critical CHANGSHA MASHA ELECTRONIC TECHNOLOGY Co Ltd
Priority to CN201410674860.3A priority Critical patent/CN104463324A/en
Publication of CN104463324A publication Critical patent/CN104463324A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a convolution neural network parallel processing method based on a large-scale high-performance cluster. The method comprises the steps that (1) a plurality of copies are constructed for a network model to be trained, model parameters of all the copies are identical, the number of the copies is identical with the number of nodes of the high-performance cluster, each node is provided with one model copy, one node is selected to serve as a main node, and the main node is responsible for broadcasting and collecting the model parameters; (2) a training set is divided into a plurality of subsets, the training subsets are issued to the rest of sub nodes except the main mode each time to conduct parameter gradient calculation together, gradient values are accumulated, the accumulated value is used for updating the model parameters of the main node, and the updated model parameters are broadcast to all the sub nodes until model training is ended. The convolution neural network parallel processing method has the advantages of being capable of achieving parallelization, improving the efficiency of model training, shortening the training time and the like.

Description

A kind of convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster
Technical field
The present invention is mainly concerned with the design field of HPCC, refers in particular to a kind of convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster.
Background technology
High-performance computer is a computer cluster, multiple computer system is linked together by high speed interconnect technology by it, utilize all COMPREHENSIVE CALCULATING abilities being connected system to process mass computing problem, so be commonly called again " HPCC " or " High-Performance Computing Cluster ".High-Performance Computing Cluster, mainly for the treatment of the computational problem of complexity, is applied in the environment needing extensive scientific algorithm, as weather forecast, petroleum prospecting and reservoir simulation, molecular simulation, gene sequencing etc.The application program that High-Performance Computing Cluster is run generally uses parallel algorithm, a large general problem is divided into many little subproblems according to certain rule, different nodes in cluster calculate, and the result of these minor issues, the net result of former problem can be merged into through process.Because the calculating of these minor issues generally can walk abreast, thus the processing time of problem can be shortened.
High-Performance Computing Cluster is in computation process, and each node is collaborative work, and they process a part for large problem respectively, and carries out exchanges data as required in processes, and the result of each node is all a part for net result.The processing power of High-Performance Computing Cluster is directly proportional to the scale of cluster, is each node processing power sum in cluster.Along with a large amount of application developments and transplanting, aggregated structure obtains outstanding performance with lower cost, thus become the main flow of high-performance calculation, comply with popular development trend, promoted that aggregated structure is widely used in high-performance computer system.In the process of the continuous lifting of CPU and GPU computing power, the computational resource how integrating both certainly will become a study hotspot.
Convolutional neural networks is a kind of special deep-neural-network model.Convolutional network designs by the inspiration of optic nerve mechanism at first, is for identifying two-dimensional shapes and a multilayer perceptron designing, and the distortion of this network structure to translation, proportional zoom, inclination or his form altogether has height unchangeability.Within 1962, Hubel and Wiesel is by the research to cat visual cortex cell, proposes the concept of receptive field (receptive field).Within 1984, Japanese scholars Fukushima proposes neocognitron (neocognitron) model based on receptive field concept, it can be regarded as first realization of convolutional neural networks, is also the first Application of receptive field concept in artificial neural network field.
Usually, the basic structure of convolutional neural networks comprises two-layer, and one is feature extraction layer, and each neuronic input is connected with the local acceptance domain of front one deck, and extracts the feature of this local.Once after this local feature is extracted, the position relationship between it and further feature is also decided thereupon; It two is Feature Mapping layers, and each computation layer of network is made up of multiple Feature Mapping, and each Feature Mapping is a plane, and in plane, all neuronic weights are equal.Feature Mapping structure adopts sigmoid function as the activation function of convolutional network, makes Feature Mapping have shift invariant.In addition, because the neuron on a mapping face shares weights, the number of freedom of network parameter is decreased.Each convolutional layer in convolutional neural networks is used for asking the computation layer of local average and second extraction followed by one, and this distinctive twice feature extraction structure reduces feature resolution.
Convolutional neural networks is mainly used to the X-Y scheme identifying displacement, convergent-divergent and other form distortion unchangeability.Because the feature detection layer of convolutional neural networks is learnt by training data, so when using convolutional neural networks, avoiding the feature extraction of display, and implicitly learning from training data; Moreover due to the neuron weights on same Feature Mapping face identical, so network can collateral learning, this is also that convolutional network is connected with each other relative to neuron a large advantage of network.Convolutional neural networks has unique superiority with the special construction that its local weight is shared in speech recognition and image procossing, its layout is closer to the biological neural network of reality, weights share the complicacy reducing network, and particularly the image of multidimensional input vector directly can input the complexity that this feature of network avoids data reconstruction in feature extraction and assorting process.
Convolutional neural networks has become the study hotspot of current speech analysis and field of image recognition, but because the network number of plies is many, weighting parameter enormous amount, therefore the training time of network model is usually at tens of sky even some months, and the training time is longer also makes the popularization of convolutional neural networks more limited.But due to the advantage that its weights are shared, convolutional neural networks collateral learning provides thinking for solving the problem, especially in the current era that GPU computing power constantly rises, the focus that the training of concurrent computation resource to convolutional neural networks accelerates to also become research how is integrated.
And the Study on Acceleration of current international forward position to neural network mainly concentrates on both direction: first, parallel accelerate is carried out based on polylith GPU on individual server, individual server does not relate to the data transmission between multiple node, parallel accelerate easily realizes, but the dimension-limited of network model is in the configuration of individual server; The second, use the training of large-scale cluster to neural network to accelerate, propose DistBelief model, but be not applied in convolutional neural networks, apply comparatively extensive in limited Boltzmann machine and dark belief network.Therefore in conjunction with the calculating advantage of extensive High-Performance Computing Cluster, realize the collateral learning of convolutional neural networks, improving the efficiency of training pattern, is a technical barrier of this area, is also to reduce convolutional neural networks study threshold, widens an importance of its application.
Summary of the invention
The technical problem to be solved in the present invention is just: the technical matters existed for prior art, the invention provides a kind of efficiency, the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster of minimizing training time that can realize parallelization, improve model training.
For solving the problems of the technologies described above, the present invention by the following technical solutions:
Based on a convolutional neural networks method for parallel processing for extensive High-Performance Computing Cluster, the steps include:
(1) network model will trained constructs multiple copy, and the model parameter of each copy is all identical, and the number of copy is identical with the nodes of High-Performance Computing Cluster, and each node distributes a model copy; A selected node, as host node, is responsible for broadcast and the collection of model parameter;
(2) training set is divided into some subsets, each all the other child nodes training subset is distributed to except host node, jointly carry out the calculating of parameter gradients, and Grad is added up, aggregate-value is used for upgrading host node model parameter, model parameter after upgrading is broadcast to each child node, until model training stops.
As a further improvement on the present invention: in described step (1), before each iteration, first the parameter of network model is carried out random initializtion, initialized model parameter comprises weighting parameter W, bias unit b; First carry out initialization according to the network parameter of input, more successively initialization network weight parameter and bias unit.
As a further improvement on the present invention: described initialized mode adopts rands random fashion, random value parameter from-1 to 1 is made.
As a further improvement on the present invention: the more new technological process also comprising step (3) model parameter, that is: after the step number that iteration is certain, each child node passes the parameter gradients of accumulation back host node, and the unified renewal carrying out stipulations operation and model parameter; Then, then the model parameter after upgrading is distributed to each child node, each child node carries out the calculating of gradient again, until model training stops.
As a further improvement on the present invention: in step (2), make host process open separately a thread and to look ahead training set, adopt one process carry out digital independent and data set is distributed to other nodes; That is, No. 0 process is set to host process, is responsible for reading and the transmission of data, all the other calculation procedure are responsible for receiving data, send and receive to adopt MPI_Send and MPI_Recv to realize.
As a further improvement on the present invention: in step (2), each node is the training adopting the mode of parallel training to carry out model parameter; That is: each computing node carries out the training of network model parameter for the training dataset being assigned to this node.
As a further improvement on the present invention: the basic layer structure of described model training comprises convolutional layer, down-sampling layer and full articulamentum, each level comprises propagated forward and backward feedback two class calculates; Described convolutional layer is feature extraction layer, and each neuronic input is connected with the local receptor field of front one deck, and extracts this local feature; Described down-sampling layer is Feature Mapping layer, and each Feature Mapping is a plane, and in plane, all neuronic weights are equal; The feature integration of extraction is an one-dimensional vector by described full articulamentum, is finally connected on sorter, completes the classification feature of whole network; The calculating of described propagated forward, be the result of calculating and training label are compared, its error carries out backpropagation, calculates the gradient delta w size that local derviation obtains each model parameter in each level, and added up by Δ w according to stochastic gradient descent algorithm SGD; Repeat the process of above-mentioned forward-backward algorithm, constantly cumulative model parameter gradient delta w, when iterative computation number of times is accumulated to certain threshold value on each computing node, carries out synchronous communication, completes the renewal of model parameter.
As a further improvement on the present invention: in described step (3), when whole iterative computation is to certain number of times, all computing nodes pass accumulative parameter gradients Δ w back host process, and host process carries out stipulations operation to the Δ w that each process is passed back, and upgrades model parameter w:
Δw = μΔw + ϵ ( ⟨ ∂ E ∂ w ⟩ i - ωw ) - - - ( 1 )
w=w+△w (2)
Compared with prior art, the invention has the advantages that:
(1) convolutional neural networks algorithm is expanded to multiple server even on large-scale cluster, the structure of neural network algorithm is amassed and DistBelief model merges by reel, propose the algorithm structure of new applicable large-scale cluster, improve the scope of application of algorithm.
(2) advantage that the weights more making full use of convolutional neural networks algorithm are shared, carries out data parallel by the calculated amount of convolutional layer, and more computational resource can be utilized to improve counting yield, greatly reduces the primitive network tediously long model training time.
(3) by the improvement to algorithm model, convolutional neural networks is improved the new application becoming high-performance computing sector, existing computational resource and the computing technique of high-performance computing sector can be utilized more, greatly optimize its counting yield, simultaneously along with the raising of counting yield, the scale of application can be further expanded, such as computer vision, speech processes, the aspects such as natural language processing, more performance convolutional neural networks better advantage in the application.
Accompanying drawing explanation
Fig. 1 is the general system set-up schematic diagram of isomery HPCC.
Fig. 2 is the neural network basic composition cell schematics that the present invention adopts when embody rule.
Fig. 3 is schematic flow sheet of the present invention.
Fig. 4 is that the basic layer of the convolutional neural networks that the present invention adopts when embody rule forms schematic diagram.
Fig. 5 is the present invention's basic calculating operation chart of convolutional layer and down-sampling in convolutional neural networks when embody rule.
Fig. 6 is the present invention's algorithm data flow graph in conjunction with HPCC framework when embody rule.
Embodiment
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
The present invention, when embody rule, first needs to build extensive High-Performance Computing Cluster environment.High-Performance Computing Cluster environment is divided into software environment and hardware environment.Hardware is one group 1 to the aggregate of the identical independently calculating (node) of N number of configuration, connected by high performance internet between node; Each node, can also collaborative work show as a computational resource that is single, that concentrate for parallel computation task except can as a single computational resource for except oolhiu interactive user.
The task of High-Performance Computing Cluster mainly concentrates on scientific algorithm, therefore higher to the requirement of hardware computing power.Except frequency will be selected higher, outside the more CPU of core number, GPU is also indispensable as the important computations acceleration equipment on heterogeneous platform.Such as, the application popularization on a large scale based on the GPU of Kepler framework of new generation is come.Meanwhile, High-Performance Computing Cluster may occur that maximum problem is communication, and the time cost of exchanges data often becomes the bottleneck of program feature.Adopt High Speed I nfiniband optical fiber interconnections, have at a high speed, the transport property of low delay, use based on trust, the mechanism of current control guarantees the integrality that connects, packet is seldom lost.The development of InfiniBand is comparatively rapid, and SDR pattern from, ddr mode, QDR pattern, to the FDR pattern of today, from single channel, 4 passages, until nowadays support 12 passages, its transfer rate is end-to-end reaches as high as 240Gbps.
What software environment can adopt is (SuSE) Linux OS, and conventional has Centos, Redhat and Ubantu.The GNU compiler that translation and compiling environment acquiescence uses, if the efficiency of Intel compiler of having ready conditions is higher.Message passing interface MPI is current most popular distributed store Parallel Programming Environment, therefore needs a specific implementation, and guarantee software environment supports the operation of MPI program.Conventional MPI comprises MPICH2, Openmpi and Intel mpi.Use GPU to accelerate, also need to select compatible CUDA Driver to be installed, Toolkit and SDK version according to model.
As shown in Figure 1, be the general system set-up figure of the isomery HPCC in embody rule example.In figure, left side is Heterogeneous Computing node, is main computational resource, and wherein each node is exactly an independent server, and its framework is CPU+GPU, and this framework uses more and more extensive in high-performance field.Switch is managed by Infiniband switch and gigabit interconnected between node, InfiniBand is a kind of connected mode of long cable, have at a high speed, the transport property of low delay, use based on trust, the mechanism of current control guarantees the integrality that connects, packet is seldom lost.Use the network node of InfiniBand, generally need HCA card is installed.The development of InfiniBand is comparatively rapid, and SDR pattern from, ddr mode, QDR pattern, to the FDR pattern of today, from single channel, 4 passages, until nowadays support 12 passages, its transfer rate is end-to-end reaches as high as 240Gbps.And the transmission of gigabit management switch primary responsibility instruction, the transmission of not responsible concrete calculating data.
Right side switch directly and I/O node interconnect, more directly accesses disk array, and this is suitable for large-scale centralized stores data, and disk array principle utilizes array mode to make disk group, coordinates the design of data scatter arrangement, the security of lifting data.Disk array can have multiple reading-writing port on the one hand, is accessed by multiple node simultaneously, improves the speed of transmission, and redundant array greatly can improve the security of data on the other hand.Bottom-right management node is that user can use computational resource indirectly by login management node for user is arranged, and this way to manage makes computational resource be convenient to management, for user provides service convenient.
As shown in Figure 2, be neural network basic composition unit that the present invention adopts when embody rule.Neural network is by the elementary cell to human brain---neuronic modeling and connection, and explore the model of simulation human brain nervous function, and develop a kind of manual system with Intelligent Information Processing functions such as study, association, memory and pattern-recognitions.The study of neural network is a process, Fig. 2 is exactly single neuron, residing for it environment excitation under, in succession input some sample modes X1 to network, X2, X3, then reacted by neuron, namely adds upper offset b with weight matrix W convolution, according to the value of result X, according to the weight matrix W of each layer of gradient descent algorithm adjustment network, treat that each layer weights of network all converge to certain value, learning process terminates.Then, just can to do True Data by the neural network generated and classify.
As shown in Figure 3, the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster of the present invention, the steps include:
(1) network model will trained constructs multiple copy, and the model parameter of each copy is all identical, and the number of copy is identical with the nodes of High-Performance Computing Cluster, and each node distributes a model copy; A selected node, as host node, is responsible for broadcast and the collection of model parameter.
(2) training set is divided into some subsets, training subset is distributed to all the other child nodes except host node at every turn, jointly carries out the calculating of parameter gradients, and Grad is added up, until model training stops.
In preferably example, also comprise step (3): the more new technological process of model parameter, that is: in order to ensure the renewal of model parameter, can again after the step number that iteration is certain, each child node passes the parameter gradients of accumulation back host node, and the unified renewal carrying out stipulations operation and model parameter; Then, then the model parameter after upgrading is distributed to each child node, each child node carries out the calculating of gradient again according to above-mentioned step, until model training stops.
In the middle of above-mentioned steps (1), need to carry out model parameter initialization and broadcast.That is, first the parameter of network model is carried out random initializtion before each iteration, main initialized model parameter has weighting parameter W, bias unit b.
According to the function of each layer of network model and the difference of scale, weighting parameter W and the bias unit b of each layer are also different.First initialization should be carried out according to the network parameter of input, net_.reset (new Net<Dtype> (train_net_param)), more successively initialization network weight parameter and bias unit.
Concrete initialized mode can adopt rands random fashion, has certain randomness, makes random value parameter from-1 to 1.
In order to ensure that each node starts the model before calculating and is consistent, need to be broadcast to each computing node to the initial value of model parameter; That is:
MPI_Bcast(net_params[param_id]->mutable_cpu_data(),net_params[param_id]->count(),
((sizeof(Dtype)==8)?MPI_DOUBLE:MPI_FLOAT),0,MPI_COMM_WORLD)
In the middle of step (2), after the model parameter after having had initialization, each node has had the basis can carrying out training calculating.Because model training data set is comparatively huge, in order to reduce the reading time of data set, therefore the present invention makes host process open separately a thread looking ahead training set further.By the time that computing time, obfuscated data was looked ahead, improve executing efficiency.
Due to the singularity of the tissue database of training dataset, each only permission process carries out read access, therefore adopts one process carry out digital independent and data set is distributed to other computing nodes.When specific implementation, No. 0 process that realizes in program is set to host process, is responsible for reading and the transmission of data, and all the other calculation procedure are responsible for receiving data, sends and receive to adopt MPI_Send and MPI_Recv to realize.
In above-mentioned steps (2), each node is the training adopting the mode of parallel training to carry out model parameter.That is: each computing node carries out the training of network model parameter for the training dataset being assigned to this node.Owing to each computing node being preserved the copy of a network model, the computation process therefore on each computing node is consistent, only has training dataset different.
The training process of model is identical with general convolutional neural networks training process, and according to network configuration, basic hierarchical structure comprises convolutional layer, down-sampling layer and full articulamentum.Each level comprises propagated forward and backward feedback two class calculates.
Convolutional layer is feature extraction layer, and each neuronic input is connected with the local receptor field of front one deck, and extracts this local feature.
Down-sampling layer is Feature Mapping layer (subsampling layer), and each Feature Mapping is a plane, and in plane, all neuronic weights are equal.
The feature integration of extraction is an one-dimensional vector by full articulamentum, is finally connected on sorter, completes the classification feature of whole network.
By the calculating of propagated forward, the result of calculating and training label are compared, its error carries out backpropagation, calculates the gradient delta w size that local derviation obtains each model parameter in each level, and added up by Δ w according to stochastic gradient descent algorithm SGD.Repeat the process of above-mentioned forward-backward algorithm, constantly cumulative model parameter gradient delta w, when iterative computation number of times is accumulated to certain threshold value on each computing node, needs to carry out synchronous communication, completes the renewal of model parameter.
In above-mentioned steps (3), when whole iterative computation is to certain number of times, all computing nodes pass accumulative parameter gradients Δ w back host process, and host process carries out stipulations operation to the Δ w that each process is passed back, and upgrades model parameter w:
&Delta;w = &mu;&Delta;w + &epsiv; ( &lang; &PartialD; E &PartialD; w &rang; i - &omega;w ) - - - ( 1 )
w=w+△w (2)
Wherein μ is momentum factor, and ω is weights retardation coefficients, and ε is learning rate, and i is the size of a packet.
Due to be carry out in extensive High-Performance Computing Cluster distribution calculate, same iterations carries out model parameter renewal, but the training set number that unit iterations completes adds N doubly (N is the nodes of cluster), this is just equal to the popularization of individualized training collection N doubly, therefore the parameter in above-mentioned formula needs to carry out certain amendment to adapt to the change in this scale, the present invention is front being multiplied by a popularization factor N, and last formula is:
&Delta;w = &mu;&Delta;w + N * &epsiv; ( &lang; &PartialD; E &PartialD; w &rang; i - &omega;w ) - - - ( 1 )
w=w+△w (2)
As shown in Figure 3, be the schematic flow sheet in embody rule example of the present invention.When algorithm is initial, first carry out the reading of network model configuration file, and initialization is carried out to network structure and each layer model parameter.Identical in order to ensure the network model copy parameter on each node subsequently, host process uses MPI_Bcast to the model parameter broadcast after initialization, makes copy on each node identical.
When access is stored in the training dataset of disk array, due to the singularity of the tissue database of training dataset, only support one process to its access, the mode therefore taked here is that host process is responsible for reading training dataset and being distributed to other computing nodes.Simultaneously because the scale of training dataset is generally all in tens the G even size of hundreds of G, each iterative computation needs the picture bag of an access batchsize size, therefore takes the mode of data pre-fetching to carry out the time of hiding data reading.Host process is opened separately a thread at every turn and is responsible for looking ahead of data and sends, the size of each reading batchsize, and send to a computing node, and the stipulations that host process is responsible for later stage Grad calculate and parameter renewal simultaneously, utilize the time of the time obfuscated data transmission calculated, improve execution efficiency.
Model training can be carried out after each computing node receives data set, forward calculation and back-propagating, and constantly add up Δ w, when whole iterations reaches certain threshold value, each child node passes Δ w back host process, and host process carries out stipulations to Δ w value, and is upgraded weight matrix w by the Δ w after stipulations, again the weight matrix after renewal is broadcast to each child node again, child node is re-starting the training of model parameter.Repeat above-mentioned computation process, until finally meet the requirement of iterations or model parameter finally reaches convergence state, algorithm completes.
As shown in Figure 4, be the basic layer pie graph of the present invention's convolutional neural networks in embody rule example.Convolutional neural networks is the neural network of a multilayer, and every layer is made up of multiple two dimensional surface, and each plane is made up of multiple independent neuron.As Fig. 4, input picture is by carrying out convolution with three trainable wave filters with being biased, three Feature Mapping figure are produced at C1 layer after convolution, then four pixels often organized in Feature Mapping figure are sued for peace again, weighted value, be biased, obtained the Feature Mapping figure of three S2 layers by a Sigmoid activation function.These mapping graphs entered filtering again and obtained C3 layer.This hierarchical structure is the same with S2 again produces S4.Finally, these pixel values are rasterized, and connect into a vector and be input to traditional neural network, exported.
Usually, C layer is feature extraction layer, and each neuronic input is connected with the local receptor field of front one deck, and extracts the feature of this local, once after this local feature is extracted, the position relationship between it and other features is also decided thereupon; S layer is Feature Mapping layer, and each computation layer of network is made up of multiple Feature Mapping, and each Feature Mapping is a plane, and in plane, all neuronic weights are equal.Feature Mapping structure adopts sigmoid function that influence function core is little as the activation function of convolutional network, makes Feature Mapping have shift invariant.
In addition, because the neuron on a mapping face shares weights, thus decrease the number of freedom of network parameter, reduce the complexity that network parameter is selected.Each feature extraction layer (C-layer) in convolutional neural networks is used for asking the computation layer (S-layer) of local average and second extraction followed by one, and this distinctive twice feature extraction structure makes network have higher distortion tolerance when identifying to input amendment.
See Fig. 5, for the present invention is when embody rule, the basic calculating operation chart of convolutional layer and down-sampling in convolutional neural networks.Convolution process comprises: with a trainable wave filter f xdeconvolute an image inputted (first stage is the image inputted, and the stage has below been exactly convolution feature map), then adds a biased b x, obtain convolutional layer C x.Sub-sampling procedures comprises: four the pixel summations of every neighborhood become a pixel, then by scalar W x+1weighting, then increase biased b x+1, then by a sigmoid activation function, produce the Feature Mapping figure S that is probably reduced four times x+1.Its its main operational is convolution algorithm, uses general formula to represent its calculating behavior:
As shown in Figure 6, be the present invention's algorithm data flow graph in conjunction with HPCC framework after being applied particularly to HPCC.Stochastic gradient descent (SGD) method should be the optimization method of the most frequently used training deep neural network.But unfortunately, traditional SGD method succession in essence, makes under large data collection, become no longer applicable, because the machinery compartment data mobile required for this complete serial mode is very consuming time.In order to be applied on large data sets by SGD, one uses the stochastic gradient descent mutation method of multiple distribution copy as follows: training set is divided some subsets, and the model copy independent to each performs different training subsets.Communication between model copy all exchanges data by Infiniband, and host process is responsible for the renewal of Maintenance Model parameter, and is responsible for being broadcast to other child nodes.Before each batch of process, model copy all receives up-to-date model parameter from host process.After copy obtains the model parameter after upgrading, runs the gradient that batchsize sample carrys out calculating parameter, and be pushed to host process, for the model parameter value that renewal is current.And host process is by carrying out stipulations to Grad, and upgrade model parameter, the new model parameter obtained is broadcast to each node, each child node can re-start the computation process of next batchsize sample.
Below be only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, should be considered as protection scope of the present invention.

Claims (8)

1. based on a convolutional neural networks method for parallel processing for extensive High-Performance Computing Cluster, it is characterized in that, step is:
(1) network model will trained constructs multiple copy, and the model parameter of each copy is all identical, and the number of copy is identical with the nodes of High-Performance Computing Cluster, and each node distributes a model copy; A selected node, as host node, is responsible for broadcast and the collection of model parameter;
(2) training set is divided into some subsets, each all the other child nodes training subset is distributed to except host node, jointly carry out the calculating of parameter gradients, and Grad is added up, aggregate-value is used for upgrading host node model parameter, model parameter after upgrading is broadcast to each child node, until model training stops.
2. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 1, it is characterized in that, in described step (1), first the parameter of network model is carried out random initializtion before each iteration, initialized model parameter comprises weighting parameter W, bias unit b; First carry out initialization according to the network parameter of input, more successively initialization network weight parameter and bias unit.
3. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 2, is characterized in that, described initialized mode adopts rands random fashion, makes random value parameter from-1 to 1.
4. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 1 or 2 or 3, it is characterized in that, also comprise the more new technological process of step (3) model parameter, that is: after the step number that iteration is certain, each child node passes the parameter gradients of accumulation back host node, and the unified renewal carrying out stipulations operation and model parameter; Then, then the model parameter after upgrading is distributed to each child node, each child node carries out the calculating of gradient again, until model training stops.
5. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 1 or 2 or 3, it is characterized in that, in step (2), make host process open separately a thread to look ahead training set, adopt one process carry out digital independent and data set is distributed to other nodes; That is, No. 0 process is set to host process, is responsible for reading and the transmission of data, all the other calculation procedure are responsible for receiving data, send and receive to adopt MPI_Send and MPI_Recv to realize.
6. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 1 or 2 or 3, is characterized in that, in step (2), each node is the training adopting the mode of parallel training to carry out model parameter; That is: each computing node carries out the training of network model parameter for the training dataset being assigned to this node.
7. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 4, it is characterized in that, the basic layer structure of described model training comprises convolutional layer, down-sampling layer and full articulamentum, and each level comprises propagated forward and backward feedback two class calculates; Described convolutional layer is feature extraction layer, and each neuronic input is connected with the local receptor field of front one deck, and extracts this local feature; Described down-sampling layer is Feature Mapping layer, and each Feature Mapping is a plane, and in plane, all neuronic weights are equal; The feature integration of extraction is an one-dimensional vector by described full articulamentum, is finally connected on sorter, completes the classification feature of whole network; The calculating of described propagated forward, be the result of calculating and training label are compared, its error carries out backpropagation, calculates the gradient delta w size that local derviation obtains each model parameter in each level, and added up by Δ w according to stochastic gradient descent algorithm SGD; Repeat the process of above-mentioned forward-backward algorithm, constantly cumulative model parameter gradient delta w, when iterative computation number of times is accumulated to certain threshold value on each computing node, carries out synchronous communication, completes the renewal of model parameter.
8. the convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster according to claim 7, it is characterized in that, in described step (3), when whole iterative computation is to certain number of times, all computing nodes pass accumulative parameter gradients Δ w back host process, host process carries out stipulations operation to the Δ w that each process is passed back, and upgrades model parameter w:
&Delta;w = &mu;&Delta;w + &epsiv; ( &lang; &PartialD; E &PartialD; w &rang; i - &omega;w ) - - - ( 1 )
w=w+△w (2)
CN201410674860.3A 2014-11-21 2014-11-21 Convolution neural network parallel processing method based on large-scale high-performance cluster Pending CN104463324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410674860.3A CN104463324A (en) 2014-11-21 2014-11-21 Convolution neural network parallel processing method based on large-scale high-performance cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410674860.3A CN104463324A (en) 2014-11-21 2014-11-21 Convolution neural network parallel processing method based on large-scale high-performance cluster

Publications (1)

Publication Number Publication Date
CN104463324A true CN104463324A (en) 2015-03-25

Family

ID=52909329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410674860.3A Pending CN104463324A (en) 2014-11-21 2014-11-21 Convolution neural network parallel processing method based on large-scale high-performance cluster

Country Status (1)

Country Link
CN (1) CN104463324A (en)

Cited By (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732278A (en) * 2015-04-08 2015-06-24 中国科学技术大学 Deep neural network training method based on sea-cloud collaboration framework
CN104808794A (en) * 2015-04-24 2015-07-29 北京旷视科技有限公司 Method and system for inputting lip language
CN105550749A (en) * 2015-12-09 2016-05-04 四川长虹电器股份有限公司 Method for constructing convolution neural network in novel network topological structure
CN105575389A (en) * 2015-12-07 2016-05-11 百度在线网络技术(北京)有限公司 Model training method, system and device
CN105787490A (en) * 2016-03-24 2016-07-20 南京新与力文化传播有限公司 Commodity fashion identification method and device based on deep learning
WO2016119429A1 (en) * 2015-01-26 2016-08-04 华为技术有限公司 System and method for training parameter set in neural network
CN105868181A (en) * 2016-04-21 2016-08-17 南京大学 Novel neural network based automatic natural language parallel structure recognition method
CN106022285A (en) * 2016-05-30 2016-10-12 北京智芯原动科技有限公司 Vehicle type identification method and vehicle type identification device based on convolutional neural network
CN106203621A (en) * 2016-07-11 2016-12-07 姚颂 The processor calculated for convolutional neural networks
CN106293942A (en) * 2016-08-10 2017-01-04 中国科学技术大学苏州研究院 Neutral net load balance optimization method based on the many cards of multimachine and system
CN106297774A (en) * 2015-05-29 2017-01-04 中国科学院声学研究所 The distributed parallel training method of a kind of neutral net acoustic model and system
CN106297297A (en) * 2016-11-03 2017-01-04 成都通甲优博科技有限责任公司 Traffic jam judging method based on degree of depth study
CN106355247A (en) * 2016-08-16 2017-01-25 北京比特大陆科技有限公司 Method for data processing and device, chip and electronic equipment
CN106599898A (en) * 2016-12-13 2017-04-26 郑州云海信息技术有限公司 Image feature extraction method and system
CN106650925A (en) * 2016-11-29 2017-05-10 郑州云海信息技术有限公司 Deep learning framework Caffe system and algorithm based on MIC cluster
WO2017084016A1 (en) * 2015-11-16 2017-05-26 华为技术有限公司 Model parameter fusion method and apparatus
CN106951926A (en) * 2017-03-29 2017-07-14 山东英特力数据技术有限公司 The deep learning systems approach and device of a kind of mixed architecture
CN106991474A (en) * 2017-03-28 2017-07-28 华中科技大学 The parallel full articulamentum method for interchanging data of deep neural network model and system
CN107038506A (en) * 2017-05-09 2017-08-11 华东师范大学 A kind of factory's intelligent early-warning method
CN107085743A (en) * 2017-05-18 2017-08-22 郑州云海信息技术有限公司 A kind of deep learning algorithm implementation method and platform based on domestic many-core processor
CN107146027A (en) * 2017-05-09 2017-09-08 华东师范大学 A kind of factory's intelligent early-warning system
CN107229518A (en) * 2016-03-26 2017-10-03 阿里巴巴集团控股有限公司 A kind of distributed type assemblies training method and device
CN107229966A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 A kind of model data update method, apparatus and system
WO2017167114A1 (en) * 2016-03-31 2017-10-05 阿里巴巴集团控股有限公司 Method and device for training model of quasi-alexnet
CN107330516A (en) * 2016-04-29 2017-11-07 腾讯科技(深圳)有限公司 Model parameter training method, apparatus and system
CN107341127A (en) * 2017-07-05 2017-11-10 西安电子科技大学 Convolutional neural networks accelerated method based on OpenCL standards
CN107563392A (en) * 2017-09-07 2018-01-09 西安电子科技大学 The YOLO object detection methods accelerated using OpenCL
CN107563507A (en) * 2017-08-29 2018-01-09 南京中蓝数智信息技术有限公司 Deep learning method based on big data
CN107590534A (en) * 2017-10-17 2018-01-16 北京小米移动软件有限公司 Train the method, apparatus and storage medium of depth convolutional neural networks model
CN107688493A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Train the method, apparatus and system of deep neural network
CN107710237A (en) * 2015-06-29 2018-02-16 微软技术许可有限责任公司 Deep neural network divides on server
CN108021395A (en) * 2017-12-27 2018-05-11 北京金山安全软件有限公司 Data parallel processing method and system for neural network
CN108073550A (en) * 2016-11-14 2018-05-25 耐能股份有限公司 Buffer unit and convolution algorithm apparatus and method
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization
CN108122032A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of neural network model training method, device, chip and system
CN108154237A (en) * 2016-12-06 2018-06-12 华为技术有限公司 A kind of data processing system and method
CN108154228A (en) * 2016-12-28 2018-06-12 上海寒武纪信息科技有限公司 A kind of artificial neural networks device and method
WO2018107934A1 (en) * 2016-12-14 2018-06-21 腾讯科技(深圳)有限公司 Data processing method and apparatus, and electronic device
CN108268946A (en) * 2016-12-31 2018-07-10 上海兆芯集成电路有限公司 The neural network unit of circulator with array-width sectional
CN108268638A (en) * 2018-01-18 2018-07-10 浙江工业大学 A kind of generation confrontation network distribution type implementation method based on Spark frames
CN108304924A (en) * 2017-12-21 2018-07-20 内蒙古工业大学 A kind of pipeline system pre-training method of depth confidence net
CN108431794A (en) * 2016-03-18 2018-08-21 微软技术许可有限责任公司 Method and apparatus for training learning machine
CN108694690A (en) * 2017-04-08 2018-10-23 英特尔公司 Subgraph in frequency domain and the dynamic select to the convolution realization on GPU
WO2019001071A1 (en) * 2017-06-28 2019-01-03 浙江大学 Adjacency matrix-based graph feature extraction system and graph classification system and method
CN109146073A (en) * 2017-06-16 2019-01-04 华为技术有限公司 A kind of neural network training method and device
CN109255755A (en) * 2018-10-24 2019-01-22 上海大学 Image super-resolution rebuilding method based on multiple row convolutional neural networks
CN109272118A (en) * 2018-08-10 2019-01-25 北京达佳互联信息技术有限公司 Data training method, device, equipment and storage medium
CN109631848A (en) * 2018-12-14 2019-04-16 山东鲁能软件技术有限公司 Electric line foreign matter intruding detection system and detection method
CN109657794A (en) * 2018-12-20 2019-04-19 中国科学技术大学 A kind of distributed deep neural network performance modelling method of queue based on instruction
CN110018970A (en) * 2018-01-08 2019-07-16 腾讯科技(深圳)有限公司 Cache prefetching method, apparatus, equipment and computer readable storage medium
CN110059813A (en) * 2019-02-13 2019-07-26 阿里巴巴集团控股有限公司 The method, device and equipment of convolutional neural networks is updated using GPU cluster
CN110096346A (en) * 2019-03-29 2019-08-06 广州思德医疗科技有限公司 A kind of training mission processing method and processing device of more calculate nodes
CN110135581A (en) * 2016-01-20 2019-08-16 北京中科寒武纪科技有限公司 Device and method for executing the reversed operation of artificial neural network
CN110197268A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197271A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197263A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197270A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110209503A (en) * 2019-08-01 2019-09-06 上海燧原智能科技有限公司 Specification calculation method, device, equipment and the medium of multidimensional tensor
CN110245752A (en) * 2017-08-31 2019-09-17 北京中科寒武纪科技有限公司 A kind of connection operation method and device entirely
CN110322020A (en) * 2018-03-28 2019-10-11 国际商业机器公司 The autoadapted learning rate scheduling of distributed random gradient decline
CN110472731A (en) * 2019-08-16 2019-11-19 北京金山数字娱乐科技有限公司 Gradient synchronous method and device during a kind of distribution is trained
CN110516795A (en) * 2019-08-28 2019-11-29 北京达佳互联信息技术有限公司 A kind of method, apparatus and electronic equipment for model variable allocation processing device
CN110892477A (en) * 2017-06-08 2020-03-17 D5Ai有限责任公司 Gradient direction data segmentation for neural networks
CN111008040A (en) * 2019-11-27 2020-04-14 厦门星宸科技有限公司 Cache device and cache method, computing device and computing method
CN111052155A (en) * 2017-09-04 2020-04-21 华为技术有限公司 Distributed random gradient descent method for asynchronous gradient averaging
CN111178492A (en) * 2018-11-09 2020-05-19 中科寒武纪科技股份有限公司 Computing device, related product and computing method for executing artificial neural network model
CN111191738A (en) * 2018-11-16 2020-05-22 京东城市(南京)科技有限公司 Cross-platform data processing method, device, equipment and readable storage medium
TWI696072B (en) * 2018-08-20 2020-06-11 旺宏電子股份有限公司 Data storage apparatus, system and method
CN111310904A (en) * 2016-04-29 2020-06-19 中科寒武纪科技股份有限公司 Apparatus and method for performing convolutional neural network training
CN111324630A (en) * 2020-03-04 2020-06-23 中科弘云科技(北京)有限公司 MPI-based neural network architecture search parallelization method and equipment
CN111353589A (en) * 2016-01-20 2020-06-30 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network forward operations
CN111429142A (en) * 2020-06-10 2020-07-17 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN111695689A (en) * 2020-06-15 2020-09-22 中国人民解放军国防科技大学 Natural language processing method, device, equipment and readable storage medium
CN111786688A (en) * 2020-06-16 2020-10-16 重庆邮电大学 Broadband parallel channelization receiving method based on embedded GPU
CN111819578A (en) * 2018-02-17 2020-10-23 超威半导体公司 Asynchronous training for optimization of neural networks using distributed parameter servers with rush updates
CN111860828A (en) * 2020-06-15 2020-10-30 北京仿真中心 Neural network training method, storage medium and equipment
CN111915025A (en) * 2017-05-05 2020-11-10 英特尔公司 Immediate deep learning in machine learning for autonomous machines
WO2021017293A1 (en) * 2019-08-01 2021-02-04 平安科技(深圳)有限公司 Rule training method, apparatus, device, and storage medium
CN112346704A (en) * 2020-11-23 2021-02-09 华中科技大学 Full-streamline type multiply-add unit array circuit for convolutional neural network
CN112396154A (en) * 2019-08-16 2021-02-23 华东交通大学 Parallel method based on convolutional neural network training
CN112598118A (en) * 2021-03-03 2021-04-02 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN112988366A (en) * 2019-12-12 2021-06-18 中科寒武纪科技股份有限公司 Parameter server, master client, and weight parameter processing method and system
WO2021136065A1 (en) * 2019-12-30 2021-07-08 中兴通讯股份有限公司 Deep learning method and apparatus, network device, and readable storage medium
WO2021135810A1 (en) * 2019-12-30 2021-07-08 上海商汤智能科技有限公司 Data processing method and apparatus, computer device, storage medium, and computer program
CN113298223A (en) * 2020-02-24 2021-08-24 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
WO2021227293A1 (en) * 2020-05-09 2021-11-18 烽火通信科技股份有限公司 Universal training method and system for artificial intelligence models
CN113762456A (en) * 2020-11-26 2021-12-07 北京沃东天骏信息技术有限公司 Model parameter adjusting method and system
CN114221871A (en) * 2021-04-09 2022-03-22 无锡江南计算技术研究所 Full collection method of gridding flowing water
CN114842837A (en) * 2022-07-04 2022-08-02 成都启英泰伦科技有限公司 Rapid acoustic model training method
WO2023273579A1 (en) * 2021-06-30 2023-01-05 北京有竹居网络技术有限公司 Model training method and apparatus, speech recognition method and apparatus, and medium and device
CN116187426A (en) * 2022-11-09 2023-05-30 北京百度网讯科技有限公司 Model parameter multi-stream broadcasting method and device for deep learning model
US11900242B2 (en) 2017-12-14 2024-02-13 Cambricon Technologies Corporation Limited Integrated circuit chip apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2659867B2 (en) * 1990-02-20 1997-09-30 インターナショナル・ビジネス・マシーンズ・コーポレイション Method of constructing neural network and defining neural network model
US20040220891A1 (en) * 2003-02-28 2004-11-04 Samsung Electronics Co., Ltd. Neural networks decoder
CN103605972A (en) * 2013-12-10 2014-02-26 康江科技(北京)有限责任公司 Non-restricted environment face verification method based on block depth neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2659867B2 (en) * 1990-02-20 1997-09-30 インターナショナル・ビジネス・マシーンズ・コーポレイション Method of constructing neural network and defining neural network model
US20040220891A1 (en) * 2003-02-28 2004-11-04 Samsung Electronics Co., Ltd. Neural networks decoder
CN103605972A (en) * 2013-12-10 2014-02-26 康江科技(北京)有限责任公司 Non-restricted environment face verification method based on block depth neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JEFFREY DEAN ETAL.: ""Large Scale Distributed Deep Networks"", 《NIPS"12 PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING》 *
凡保磊: ""卷积神经网络的并行化研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李葆青: ""基于卷积神经网络的模式分类器"", 《大连大学学报》 *
龚丁禧: ""稀疏自组合时空卷积神经网络动作识别方法及其并行化"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (162)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016119429A1 (en) * 2015-01-26 2016-08-04 华为技术有限公司 System and method for training parameter set in neural network
CN104732278A (en) * 2015-04-08 2015-06-24 中国科学技术大学 Deep neural network training method based on sea-cloud collaboration framework
CN104808794A (en) * 2015-04-24 2015-07-29 北京旷视科技有限公司 Method and system for inputting lip language
CN104808794B (en) * 2015-04-24 2019-12-10 北京旷视科技有限公司 lip language input method and system
CN106297774A (en) * 2015-05-29 2017-01-04 中国科学院声学研究所 The distributed parallel training method of a kind of neutral net acoustic model and system
CN106297774B (en) * 2015-05-29 2019-07-09 中国科学院声学研究所 A kind of the distributed parallel training method and system of neural network acoustic model
CN107710237A (en) * 2015-06-29 2018-02-16 微软技术许可有限责任公司 Deep neural network divides on server
CN107209746A (en) * 2015-11-16 2017-09-26 华为技术有限公司 model parameter fusion method and device
WO2017084016A1 (en) * 2015-11-16 2017-05-26 华为技术有限公司 Model parameter fusion method and apparatus
CN107209746B (en) * 2015-11-16 2019-10-22 华为技术有限公司 Model parameter fusion method and device
US11386350B2 (en) 2015-11-16 2022-07-12 Huawei Technologies Co., Ltd. Model parameter combination method and apparatus
CN105575389B (en) * 2015-12-07 2019-07-30 百度在线网络技术(北京)有限公司 Model training method, system and device
CN105575389A (en) * 2015-12-07 2016-05-11 百度在线网络技术(北京)有限公司 Model training method, system and device
CN105550749A (en) * 2015-12-09 2016-05-04 四川长虹电器股份有限公司 Method for constructing convolution neural network in novel network topological structure
CN111353588B (en) * 2016-01-20 2024-03-05 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network reverse training
CN110135581B (en) * 2016-01-20 2020-11-06 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network inverse operation
CN111353589B (en) * 2016-01-20 2024-03-01 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network forward operations
CN111353588A (en) * 2016-01-20 2020-06-30 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network reverse training
CN111353589A (en) * 2016-01-20 2020-06-30 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network forward operations
CN110135581A (en) * 2016-01-20 2019-08-16 北京中科寒武纪科技有限公司 Device and method for executing the reversed operation of artificial neural network
CN108431794A (en) * 2016-03-18 2018-08-21 微软技术许可有限责任公司 Method and apparatus for training learning machine
US11334814B2 (en) 2016-03-18 2022-05-17 Microsoft Technology Licensing, Llc Method and apparatus for training a learning machine
CN105787490A (en) * 2016-03-24 2016-07-20 南京新与力文化传播有限公司 Commodity fashion identification method and device based on deep learning
CN107229966A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 A kind of model data update method, apparatus and system
US11636379B2 (en) 2016-03-26 2023-04-25 Alibaba Group Holding Limited Distributed cluster training method and apparatus
CN107229518A (en) * 2016-03-26 2017-10-03 阿里巴巴集团控股有限公司 A kind of distributed type assemblies training method and device
WO2017167044A1 (en) * 2016-03-26 2017-10-05 阿里巴巴集团控股有限公司 Distributed cluster training method and device
TWI712900B (en) * 2016-03-26 2020-12-11 香港商阿里巴巴集團服務有限公司 Distributed cluster training method and device
CN107229518B (en) * 2016-03-26 2020-06-30 阿里巴巴集团控股有限公司 Distributed cluster training method and device
WO2017167114A1 (en) * 2016-03-31 2017-10-05 阿里巴巴集团控股有限公司 Method and device for training model of quasi-alexnet
CN107292385A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 The model training method and device of one species Alexnet networks
CN105868181B (en) * 2016-04-21 2018-08-21 南京大学 The automatic identifying method of natural language parallel construction based on new neural network
CN105868181A (en) * 2016-04-21 2016-08-17 南京大学 Novel neural network based automatic natural language parallel structure recognition method
CN111310904A (en) * 2016-04-29 2020-06-19 中科寒武纪科技股份有限公司 Apparatus and method for performing convolutional neural network training
CN107330516B (en) * 2016-04-29 2021-06-25 腾讯科技(深圳)有限公司 Model parameter training method, device and system
CN111310904B (en) * 2016-04-29 2024-03-08 中科寒武纪科技股份有限公司 Apparatus and method for performing convolutional neural network training
CN107330516A (en) * 2016-04-29 2017-11-07 腾讯科技(深圳)有限公司 Model parameter training method, apparatus and system
CN106022285A (en) * 2016-05-30 2016-10-12 北京智芯原动科技有限公司 Vehicle type identification method and vehicle type identification device based on convolutional neural network
CN106203621A (en) * 2016-07-11 2016-12-07 姚颂 The processor calculated for convolutional neural networks
CN106203621B (en) * 2016-07-11 2019-04-30 北京深鉴智能科技有限公司 The processor calculated for convolutional neural networks
CN107688493A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Train the method, apparatus and system of deep neural network
CN107688493B (en) * 2016-08-05 2021-06-18 阿里巴巴集团控股有限公司 Method, device and system for training deep neural network
CN106293942A (en) * 2016-08-10 2017-01-04 中国科学技术大学苏州研究院 Neutral net load balance optimization method based on the many cards of multimachine and system
CN106355247A (en) * 2016-08-16 2017-01-25 北京比特大陆科技有限公司 Method for data processing and device, chip and electronic equipment
CN106355247B (en) * 2016-08-16 2019-03-08 算丰科技(北京)有限公司 Data processing method and device, chip and electronic equipment
CN106297297A (en) * 2016-11-03 2017-01-04 成都通甲优博科技有限责任公司 Traffic jam judging method based on degree of depth study
CN108073550A (en) * 2016-11-14 2018-05-25 耐能股份有限公司 Buffer unit and convolution algorithm apparatus and method
CN108122032A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of neural network model training method, device, chip and system
CN106650925A (en) * 2016-11-29 2017-05-10 郑州云海信息技术有限公司 Deep learning framework Caffe system and algorithm based on MIC cluster
CN108122032B (en) * 2016-11-29 2020-02-14 华为技术有限公司 Neural network model training method, device, chip and system
CN108154237A (en) * 2016-12-06 2018-06-12 华为技术有限公司 A kind of data processing system and method
CN108154237B (en) * 2016-12-06 2022-04-05 华为技术有限公司 Data processing system and method
WO2018103562A1 (en) * 2016-12-06 2018-06-14 华为技术有限公司 Data processing system and method
CN106599898A (en) * 2016-12-13 2017-04-26 郑州云海信息技术有限公司 Image feature extraction method and system
CN108229687B (en) * 2016-12-14 2021-08-24 腾讯科技(深圳)有限公司 Data processing method, data processing device and electronic equipment
US10943324B2 (en) 2016-12-14 2021-03-09 Tencent Technology (Shenzhen) Company Limited Data processing method, apparatus, and electronic device
CN108229687A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and electronic equipment
WO2018107934A1 (en) * 2016-12-14 2018-06-21 腾讯科技(深圳)有限公司 Data processing method and apparatus, and electronic device
CN108171323A (en) * 2016-12-28 2018-06-15 上海寒武纪信息科技有限公司 A kind of artificial neural networks device and method
CN108154228A (en) * 2016-12-28 2018-06-12 上海寒武纪信息科技有限公司 A kind of artificial neural networks device and method
CN108171323B (en) * 2016-12-28 2021-03-26 上海寒武纪信息科技有限公司 Artificial neural network computing device and method
CN108268946A (en) * 2016-12-31 2018-07-10 上海兆芯集成电路有限公司 The neural network unit of circulator with array-width sectional
CN106991474B (en) * 2017-03-28 2019-09-24 华中科技大学 The parallel full articulamentum method for interchanging data of deep neural network model and system
CN106991474A (en) * 2017-03-28 2017-07-28 华中科技大学 The parallel full articulamentum method for interchanging data of deep neural network model and system
CN106951926A (en) * 2017-03-29 2017-07-14 山东英特力数据技术有限公司 The deep learning systems approach and device of a kind of mixed architecture
CN108694690A (en) * 2017-04-08 2018-10-23 英特尔公司 Subgraph in frequency domain and the dynamic select to the convolution realization on GPU
CN111915025A (en) * 2017-05-05 2020-11-10 英特尔公司 Immediate deep learning in machine learning for autonomous machines
CN111915025B (en) * 2017-05-05 2024-04-30 英特尔公司 Instant deep learning in machine learning for autonomous machines
CN107146027A (en) * 2017-05-09 2017-09-08 华东师范大学 A kind of factory's intelligent early-warning system
CN107038506A (en) * 2017-05-09 2017-08-11 华东师范大学 A kind of factory's intelligent early-warning method
CN107085743A (en) * 2017-05-18 2017-08-22 郑州云海信息技术有限公司 A kind of deep learning algorithm implementation method and platform based on domestic many-core processor
CN110892477A (en) * 2017-06-08 2020-03-17 D5Ai有限责任公司 Gradient direction data segmentation for neural networks
CN109146073A (en) * 2017-06-16 2019-01-04 华为技术有限公司 A kind of neural network training method and device
US11475300B2 (en) 2017-06-16 2022-10-18 Huawei Technologies Co., Ltd. Neural network training method and apparatus
CN109146073B (en) * 2017-06-16 2022-05-24 华为技术有限公司 Neural network training method and device
WO2019001071A1 (en) * 2017-06-28 2019-01-03 浙江大学 Adjacency matrix-based graph feature extraction system and graph classification system and method
CN107341127B (en) * 2017-07-05 2020-04-14 西安电子科技大学 Convolutional neural network acceleration method based on OpenCL standard
CN107341127A (en) * 2017-07-05 2017-11-10 西安电子科技大学 Convolutional neural networks accelerated method based on OpenCL standards
CN107563507A (en) * 2017-08-29 2018-01-09 南京中蓝数智信息技术有限公司 Deep learning method based on big data
US11354133B2 (en) 2017-08-31 2022-06-07 Cambricon Technologies Corporation Limited Processing device and related products
CN110245752A (en) * 2017-08-31 2019-09-17 北京中科寒武纪科技有限公司 A kind of connection operation method and device entirely
US11531553B2 (en) 2017-08-31 2022-12-20 Cambricon Technologies Corporation Limited Processing device and related products
US11561800B2 (en) 2017-08-31 2023-01-24 Cambricon Technologies Corporation Limited Processing device and related products
US11334363B2 (en) 2017-08-31 2022-05-17 Cambricon Technologies Corporation Limited Processing device and related products
US11347516B2 (en) 2017-08-31 2022-05-31 Cambricon Technologies Corporation Limited Processing device and related products
US11775311B2 (en) 2017-08-31 2023-10-03 Cambricon Technologies Corporation Limited Processing device and related products
US11409535B2 (en) 2017-08-31 2022-08-09 Cambricon Technologies Corporation Limited Processing device and related products
CN111052155B (en) * 2017-09-04 2024-04-16 华为技术有限公司 Distribution of asynchronous gradient averages random gradient descent method
CN111052155A (en) * 2017-09-04 2020-04-21 华为技术有限公司 Distributed random gradient descent method for asynchronous gradient averaging
CN107563392A (en) * 2017-09-07 2018-01-09 西安电子科技大学 The YOLO object detection methods accelerated using OpenCL
CN107590534B (en) * 2017-10-17 2021-02-09 北京小米移动软件有限公司 Method and device for training deep convolutional neural network model and storage medium
CN107590534A (en) * 2017-10-17 2018-01-16 北京小米移动软件有限公司 Train the method, apparatus and storage medium of depth convolutional neural networks model
US11900242B2 (en) 2017-12-14 2024-02-13 Cambricon Technologies Corporation Limited Integrated circuit chip apparatus
CN108304924A (en) * 2017-12-21 2018-07-20 内蒙古工业大学 A kind of pipeline system pre-training method of depth confidence net
CN108021395A (en) * 2017-12-27 2018-05-11 北京金山安全软件有限公司 Data parallel processing method and system for neural network
CN108021395B (en) * 2017-12-27 2022-04-29 北京金山安全软件有限公司 Data parallel processing method and system for neural network
CN110018970B (en) * 2018-01-08 2023-07-21 腾讯科技(深圳)有限公司 Cache prefetching method, device, equipment and computer readable storage medium
CN110018970A (en) * 2018-01-08 2019-07-16 腾讯科技(深圳)有限公司 Cache prefetching method, apparatus, equipment and computer readable storage medium
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization
CN108268638A (en) * 2018-01-18 2018-07-10 浙江工业大学 A kind of generation confrontation network distribution type implementation method based on Spark frames
CN111819578A (en) * 2018-02-17 2020-10-23 超威半导体公司 Asynchronous training for optimization of neural networks using distributed parameter servers with rush updates
CN110197271A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197270A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197271B (en) * 2018-02-27 2020-10-27 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN110197270B (en) * 2018-02-27 2020-10-30 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN110197263A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197268A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110322020B (en) * 2018-03-28 2023-05-12 国际商业机器公司 Adaptive learning rate scheduling for distributed random gradient descent
CN110322020A (en) * 2018-03-28 2019-10-11 国际商业机器公司 The autoadapted learning rate scheduling of distributed random gradient decline
CN109272118B (en) * 2018-08-10 2020-03-06 北京达佳互联信息技术有限公司 Data training method, device, equipment and storage medium
CN109272118A (en) * 2018-08-10 2019-01-25 北京达佳互联信息技术有限公司 Data training method, device, equipment and storage medium
TWI696072B (en) * 2018-08-20 2020-06-11 旺宏電子股份有限公司 Data storage apparatus, system and method
CN109255755A (en) * 2018-10-24 2019-01-22 上海大学 Image super-resolution rebuilding method based on multiple row convolutional neural networks
CN109255755B (en) * 2018-10-24 2023-05-23 上海大学 Image super-resolution reconstruction method based on multi-column convolutional neural network
CN111178492A (en) * 2018-11-09 2020-05-19 中科寒武纪科技股份有限公司 Computing device, related product and computing method for executing artificial neural network model
CN111191738A (en) * 2018-11-16 2020-05-22 京东城市(南京)科技有限公司 Cross-platform data processing method, device, equipment and readable storage medium
CN109631848B (en) * 2018-12-14 2021-04-16 山东鲁能软件技术有限公司 Transmission line foreign matter intrusion detection system and detection method
CN109631848A (en) * 2018-12-14 2019-04-16 山东鲁能软件技术有限公司 Electric line foreign matter intruding detection system and detection method
CN109657794B (en) * 2018-12-20 2022-09-06 中国科学技术大学 Instruction queue-based distributed deep neural network performance modeling method
CN109657794A (en) * 2018-12-20 2019-04-19 中国科学技术大学 A kind of distributed deep neural network performance modelling method of queue based on instruction
US11640531B2 (en) 2019-02-13 2023-05-02 Advanced New Technologies Co., Ltd. Method, apparatus and device for updating convolutional neural network using GPU cluster
CN110059813A (en) * 2019-02-13 2019-07-26 阿里巴巴集团控股有限公司 The method, device and equipment of convolutional neural networks is updated using GPU cluster
CN110059813B (en) * 2019-02-13 2021-04-06 创新先进技术有限公司 Method, device and equipment for updating convolutional neural network by using GPU cluster
CN110096346A (en) * 2019-03-29 2019-08-06 广州思德医疗科技有限公司 A kind of training mission processing method and processing device of more calculate nodes
CN110209503A (en) * 2019-08-01 2019-09-06 上海燧原智能科技有限公司 Specification calculation method, device, equipment and the medium of multidimensional tensor
CN110209503B (en) * 2019-08-01 2019-10-25 上海燧原智能科技有限公司 Specification calculation method, device, equipment and the medium of multidimensional tensor
WO2021017293A1 (en) * 2019-08-01 2021-02-04 平安科技(深圳)有限公司 Rule training method, apparatus, device, and storage medium
CN110472731A (en) * 2019-08-16 2019-11-19 北京金山数字娱乐科技有限公司 Gradient synchronous method and device during a kind of distribution is trained
CN112396154A (en) * 2019-08-16 2021-02-23 华东交通大学 Parallel method based on convolutional neural network training
CN110516795A (en) * 2019-08-28 2019-11-29 北京达佳互联信息技术有限公司 A kind of method, apparatus and electronic equipment for model variable allocation processing device
CN111008040B (en) * 2019-11-27 2022-06-14 星宸科技股份有限公司 Cache device and cache method, computing device and computing method
CN111008040A (en) * 2019-11-27 2020-04-14 厦门星宸科技有限公司 Cache device and cache method, computing device and computing method
CN112988366A (en) * 2019-12-12 2021-06-18 中科寒武纪科技股份有限公司 Parameter server, master client, and weight parameter processing method and system
CN113128531A (en) * 2019-12-30 2021-07-16 上海商汤智能科技有限公司 Data processing method and device
TWI763168B (en) * 2019-12-30 2022-05-01 大陸商上海商湯智能科技有限公司 Data processing method and apparatus, computer device, storage medium
CN113128531B (en) * 2019-12-30 2024-03-26 上海商汤智能科技有限公司 Data processing method and device
WO2021135810A1 (en) * 2019-12-30 2021-07-08 上海商汤智能科技有限公司 Data processing method and apparatus, computer device, storage medium, and computer program
WO2021136065A1 (en) * 2019-12-30 2021-07-08 中兴通讯股份有限公司 Deep learning method and apparatus, network device, and readable storage medium
CN113298223B (en) * 2020-02-24 2023-12-26 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
CN113298223A (en) * 2020-02-24 2021-08-24 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN111324630A (en) * 2020-03-04 2020-06-23 中科弘云科技(北京)有限公司 MPI-based neural network architecture search parallelization method and equipment
CN111324630B (en) * 2020-03-04 2023-07-25 中科弘云科技(北京)有限公司 MPI-based neural network architecture search parallelization method and equipment
WO2021227293A1 (en) * 2020-05-09 2021-11-18 烽火通信科技股份有限公司 Universal training method and system for artificial intelligence models
CN111429142A (en) * 2020-06-10 2020-07-17 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN111860828A (en) * 2020-06-15 2020-10-30 北京仿真中心 Neural network training method, storage medium and equipment
CN111695689A (en) * 2020-06-15 2020-09-22 中国人民解放军国防科技大学 Natural language processing method, device, equipment and readable storage medium
CN111860828B (en) * 2020-06-15 2023-11-28 北京仿真中心 Neural network training method, storage medium and equipment
CN111786688A (en) * 2020-06-16 2020-10-16 重庆邮电大学 Broadband parallel channelization receiving method based on embedded GPU
US11693654B2 (en) 2020-06-16 2023-07-04 Chongqing University Of Posts And Telecommunications Embedded GPU-based wideband parallel channelized receiving method
CN111786688B (en) * 2020-06-16 2021-12-03 重庆邮电大学 Broadband parallel channelization receiving method based on embedded GPU
WO2021253840A1 (en) * 2020-06-16 2021-12-23 重庆邮电大学 Embedded gpu-based wideband parallel channelized receiving method
CN112346704A (en) * 2020-11-23 2021-02-09 华中科技大学 Full-streamline type multiply-add unit array circuit for convolutional neural network
CN112346704B (en) * 2020-11-23 2021-09-17 华中科技大学 Full-streamline type multiply-add unit array circuit for convolutional neural network
CN113762456A (en) * 2020-11-26 2021-12-07 北京沃东天骏信息技术有限公司 Model parameter adjusting method and system
CN112598118B (en) * 2021-03-03 2021-06-25 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN112598118A (en) * 2021-03-03 2021-04-02 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN114221871A (en) * 2021-04-09 2022-03-22 无锡江南计算技术研究所 Full collection method of gridding flowing water
WO2023273579A1 (en) * 2021-06-30 2023-01-05 北京有竹居网络技术有限公司 Model training method and apparatus, speech recognition method and apparatus, and medium and device
CN114842837B (en) * 2022-07-04 2022-09-02 成都启英泰伦科技有限公司 Rapid acoustic model training method
CN114842837A (en) * 2022-07-04 2022-08-02 成都启英泰伦科技有限公司 Rapid acoustic model training method
CN116187426A (en) * 2022-11-09 2023-05-30 北京百度网讯科技有限公司 Model parameter multi-stream broadcasting method and device for deep learning model
CN116187426B (en) * 2022-11-09 2024-04-19 北京百度网讯科技有限公司 Model parameter multi-stream broadcasting method and device for deep learning model

Similar Documents

Publication Publication Date Title
CN104463324A (en) Convolution neural network parallel processing method based on large-scale high-performance cluster
Yu et al. Deep learning for determining a near-optimal topological design without any iteration
WO2021190127A1 (en) Data processing method and data processing device
Cuomo et al. A GPU-accelerated parallel K-means algorithm
WO2021190597A1 (en) Processing method for neural network model, and related device
Jain et al. Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training
CN114492782B (en) On-chip core compiling and mapping method and device of neural network based on reinforcement learning
CN107636638A (en) Universal parallel computing architecture
CN114937151A (en) Lightweight target detection method based on multi-receptive-field and attention feature pyramid
CN110659723A (en) Data processing method, device, medium and electronic equipment based on artificial intelligence
US11144291B1 (en) Loop-oriented neural network compilation
CN112163601A (en) Image classification method, system, computer device and storage medium
Wang et al. Minerva: A scalable and highly efficient training platform for deep learning
CN112102165B (en) Light field image angular domain super-resolution system and method based on zero sample learning
Zhou et al. Octr: Octree-based transformer for 3d object detection
CN102024011A (en) Autonomous subsystem architecture
WO2020186061A1 (en) System and method for implementing modular universal reparameterization for deep multi-task learning across diverse domains
Li et al. Multi-task learning with deformable convolution
Liang Ascend AI Processor Architecture and Programming: Principles and Applications of CANN
Zhang et al. A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware
CN110399970A (en) Wavelet convolution wavelet neural network and intelligence analysis method and system
US11461662B1 (en) Compilation time reduction for memory and compute bound neural networks
Wan et al. Shift-BNN: Highly-efficient probabilistic Bayesian neural network training via memory-friendly pattern retrieving
Du Nguyen et al. Accelerating complex brain-model simulations on GPU platforms
Zhang et al. Enabling highly efficient capsule networks processing through software-hardware co-design

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150325