CN106991474A - The parallel full articulamentum method for interchanging data of deep neural network model and system - Google Patents
The parallel full articulamentum method for interchanging data of deep neural network model and system Download PDFInfo
- Publication number
- CN106991474A CN106991474A CN201710191684.1A CN201710191684A CN106991474A CN 106991474 A CN106991474 A CN 106991474A CN 201710191684 A CN201710191684 A CN 201710191684A CN 106991474 A CN106991474 A CN 106991474A
- Authority
- CN
- China
- Prior art keywords
- sub
- full
- layer
- connection layer
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000003062 neural network model Methods 0.000 title claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 64
- 210000002569 neuron Anatomy 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract description 14
- 238000004891 communication Methods 0.000 abstract description 12
- 230000000644 propagated effect Effects 0.000 abstract 3
- 230000001133 acceleration Effects 0.000 abstract 1
- 230000005540 biological transmission Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of parallel full articulamentum method for interchanging data of deep neural network model and system, the full articulamentum of deep neural network is evenly dividing on N number of training unit by the number of neuron, a kind of parallel network model of connection layer model complete in deep neural network is formed;During the propagated forward of full articulamentum, using the input data of the propagated forward method to front layer such as partly stopping, part is taken to reach, part is calculated, overall output and the overall processing mode propagated;During the back-propagating of full articulamentum, using stopping to wait back-propagating method to the residual error data of rear layer surely, the processing mode for quantitatively reaching, quantitatively calculating and quantitatively propagate is taken;After the completion of a forward direction with back-propagating, according to required weights gradient and threshold gradient, the weight data and threshold data of each layer are concurrently updated.Can be overlapping by the data communication of full articulamentum and data calculating progress, the convergence of acceleration model on the premise of accuracy is ensured.
Description
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a full-connection layer data exchange method and system for model parallel in a deep neural network.
Background
Deep Neural Network (DNN) is an Artificial Neural Network (ANN) composed of an input layer, a plurality of hidden layers and an output layer, each layer is composed of a plurality of neuron nodes, the neuron nodes of the front layer and the neuron nodes of the rear layer are connected with each other, as shown in fig. 1, all the layers in fig. 1 are on the same training unit, I represents the input layer, H represents the hidden layers (the hidden layers need to be in plurality), O represents the hidden layers, thin lines represent the connection of neurons and neurons, and thick lines represent the connection of components and components (herein, a certain layer). In the neural network model, a fully-Connected Layer (Full-Connected Layer) refers to a Layer in which all nodes are Connected to any one of adjacent layers, and the fully-Connected Layer is denoted by "FC".
With the increase of the training data set, in the model training of the deep neural network, the training parameters (including connection weight parameters and threshold parameters, which are also referred to as bias parameters) of the full connection layer often exceed the memory size of a single training unit (the training unit is an independent computing node, which may be a GPU card or a server node), so that the full connection layer needs to be split into N pieces, each piece is composed of part of neuron nodes and training parameters between the neuron nodes, and the N training units distributed on one or more hosts respectively hold the training parameters and cooperate with each other to complete training, as shown in fig. 2, so that a model parallel training mode of the full connection layer in the deep neural network is formed.
Communication overhead occurs when the input to one neuron comes from the output of a neuron on another training unit, as in FIG. 2, when the training unit GPU2The input of the neuron needs to come from a training unit GPU1The output of the neuron(s) above needs to be copied to the output of the latter, which results in communication overhead. In the propagation method of the deep neural network standard, the computation and the communication are strictly consistent regardless of the forward propagation or the backward propagation, the standard forward propagation (the standard backward propagation is similar to the standard forward propagation) is taken as an explanation below, and the core idea is as follows:
(1) waiting for data: the training unit waits for the arrival of the output data of all source training units (training units that generate data);
(2) and (4) integral output: the training unit calculates the output data of the current layer;
(3) and (3) overall propagation: the training unit propagates the output data to a plurality of target training units (training units that receive the data) as input data to the target training units.
However, the above method has the following drawbacks: because the training units are distributed on one or more hosts, the transmission rates of output data from the source training unit to the target training unit are inconsistent, the target training unit needs to wait for the data of the source training unit with the slowest rate to start the next work (continue to propagate forward), and the communication overhead is increased.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a deep neural network model parallel full-connection layer data exchange method and system, which can overlap data communication and data calculation of a full-connection layer and accelerate the convergence of the model on the premise of ensuring the accuracy. Therefore, the technical problem of high communication overhead in the prior art is solved.
To achieve the above object, according to an aspect of the present invention, there is provided a deep neural network model parallel full-connection layer data exchange method, including:
(1) for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
(2) in the forward propagation process of each sub full link layer, the output data of each sub full link layer is obtained by adopting a half-stop equal forward propagation method in parallel;
(3) in the backward propagation process of each sub full connection layer, the weight gradient and the threshold gradient of each sub full connection layer are obtained in parallel by adopting a backward propagation method such as fixed stop and the like based on the output data of each sub full connection layer obtained by a forward propagation method such as half stop and the like;
(4) after one-time forward propagation and backward propagation are finished, the weight data and the threshold data of each sub full connection layer are updated in parallel through the weight gradient and the threshold gradient of each sub full connection layer.
Preferably, the step (2) specifically comprises:
(2.1) for each sub-full connection layerIf any sub-full connection layer Has arrived, then the data is represented by the formula:computing sub-full connection layerSub-full connection layerWherein the subscript l denotes the index of the full link layer, the subscripts j and i denote the indexes of the sub-full link layers,representing a fully connected layer of atomsAnd a sub full connection layerThe connection weight of (a) is set,representing a fully connected layer of atomsThe output data of (a) is obtained,representing a fully connected layer of atomsSub-full connection layerThe generated input data;
(2.2) for the sub full junction layerAccording to the result of step (2.1), the following formula: computing sub-full connection layerThe overall input data of (a), wherein,representing a fully connected layer of atomsThe overall input data of (1);
(2.3) for the sub full junction layerAccording to the result of step (2.2), the following formula: computing sub-full connection layerWherein the function F represents a non-linear activation function,is a sub-full connection layerThe threshold value data of (1).
Preferably, step (3) specifically comprises:
(3.1) for each sub-full connection layerSub full link layer on Q training units To the sub full connection layerAfter the generated output residual data arrive, taking the Q output residual data as a sub full link layerThe input residual data of (a), is noted as:
(3.2) for the sub full junction layerBy the formula:accumulating the Q input residual error data in the step (3.1);
(3.3) for the sub full junction layerAccording to the result of step (3.2), calculating sub-full connection layers in parallelSub-full connection layerOutput residual data of (3), noted as: the calculation formula is as follows:
(3.4) for the sub full junction layerAccording to the result of step (3.1), calculating sub-full connection layer in parallelSub-full connection layerThe weight gradient of (a) is recorded as: the calculation formula is as follows:
(3.5) for the sub full junction layerCalculating sub-full connection layer according to the result of the step (3.2)Sub-full connection layerIs recorded as:the calculation formula is as follows:v is a unit vector, and the dimension size of V is equal to the size of a batch processing block in training;
(3.6) for the sub full junction layerRepeating the steps (3.1) to (3.5), and treating each timeQ-part sub-full connection layer pair sub-full connection layer of rear layerGenerated output residual data up to sub-full link layerAnd finishing processing all the output residual data of the rear layer.
Preferably, the step (4) specifically comprises:
(4.1) the formula:updating each sub-full connection layer in parallelWherein η represents a learning rate;
(4.2) the formula:updating each sub-full connection layer in parallelThe threshold value data of (1).
According to another aspect of the present invention, there is provided a deep neural network model parallel full-connection layer data exchange system, including:
a partitioning module for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
the forward propagation module is used for obtaining the output data of each sub full-link layer in parallel by adopting a half-stop equal forward propagation method in the forward propagation process of each sub full-link layer;
the backward propagation module is used for obtaining the weight gradient and the threshold gradient of each sub full-connection layer in parallel by adopting a backward propagation method such as stop and the like based on the output data of each sub full-connection layer obtained by a forward propagation method such as half stop and the like in the backward propagation process of each sub full-connection layer;
and the updating module is used for updating the weight data and the threshold data of each sub full connection layer in parallel by the weight gradient and the threshold gradient of each sub full connection layer after one-time forward propagation and one-time backward propagation are finished.
Generally, compared with the prior art, the above technical solutions conceived by the present invention mainly have the following technical advantages:
(1) the calculation parallelism is high: each training unit processes the current data in parallel;
(2) the communication overhead is small: the half stop equal forward propagation method and the fixed stop equal backward propagation method both maximize the calculation and communication time in the training process of the overlapping deep neural network, and reduce the communication overhead in the training.
Drawings
FIG. 1 is a schematic diagram of a deep neural network architecture in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a fully-connected layer structure of model parallelism in a deep neural network in an embodiment of the present invention;
FIG. 3 is a schematic flow chart of the overall process in an embodiment of the invention;
FIG. 4 is a flow chart of a semi-stop equal forward propagation method in an embodiment of the present invention;
FIG. 5 is a flow chart illustrating a stop-and-wait back propagation method in an embodiment of the present invention;
fig. 6 is a flowchart illustrating a method for updating weight data and threshold data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention consists of two parts, namely a half stop equal forward propagation method and a fixed stop equal backward propagation method, wherein the core idea of the half stop equal forward propagation method is as follows:
(1) and partial calculation: the training unit calculates the output Data (ID) of the front layer which has already arrived;
(2) and (4) integral output: the training unit combines all calculation results in the step (1) as input Data of the training unit, and calculates Output Data (OD) for the input Data;
(3) and (3) overall propagation: the training unit propagates the output data to a plurality of target training units as input data for the target training units.
The core idea of the stop-and-wait back propagation method is as follows:
(1) and (3) quantitative calculation: the training unit calculates residual data (also called error data, and expressed by "" in the invention) output by a rear layer after reaching Q parts (Q belongs to [1, N ], Q is a constant set by a user, and N is the number of the training units) each time;
(2) quantitative transmission: the training unit transmits the calculation result in the step (1) to a plurality of target training units as input residual data (I) of the target training units;
(3) and (3) repeating the step (1) and the step (2) until the N output residual data (O) of the rear layer are processed.
The overall thought of the invention is that in the parallel training process of the model of the deep neural network, a semi-stop equal forward propagation method is used for replacing a standard forward propagation method, a fixed stop equal backward propagation method is used for replacing a standard backward propagation method, and the calculation and communication time in the training process is overlapped, so that the training communication overhead is reduced.
Fig. 3 is a schematic general flow chart of the method in the embodiment of the present invention, and the method shown in fig. 3 includes:
(1) for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
wherein a full connection layer FCl,l∈[1,L]Dividing the number of the neurons into N equal parts to obtain N sub full-connection layers which are respectively recorded as:… andand distribute them to N training units, wherein the training unit is an independent computing node, which can be a GPU card or a server node, for other full connection layersThe same process is also performed, and a network model with parallel fully-connected layer models in a deep neural network is formed, as shown in fig. 2, each fully-connected layer is divided into N parts and held by N training units respectively, N represents the number of training units, that is, the number of divisions of each fully-connected layer, L represents the number of fully-connected layers, thin lines represent connection of neurons and neurons, and thick lines represent connection of components and components (herein, a certain part of a certain layer).
(2) In the forward propagation process of each sub full link layer, the output data of each sub full link layer is obtained by adopting a half-stop equal forward propagation method in parallel;
the half-stop equal forward propagation method shown in fig. 4 specifically includes:
(2.1) for the sub full junction layerIf any sub-full connection layer Has arrived, the calculation is performedSub-full connection layerThe calculation formula of the input data is as follows:
the subscript l in the formula represents the index of the full link layer, and the subscripts j and i represent the indexes of the sub full link layers, that is, the indexes of the training units.Representing a fully connected layer of atomsAnd a sub full connection layerThe connection weight value is set to be a connection weight value,representing a fully connected layer of atomsThe output data of (a) is obtained,representing a fully connected layer of atomsSub-full connection layerThe generated input data;
(2.2) for the sub full junction layerAccording to the result of step (2.1), calculatingThe overall input data has the calculation formula as follows:
wherein,representing a fully connected layer of atomsThe overall input data of (1);
(2.3) for the sub full junction layerAccording to the result of step (2.2), calculatingAnd finally outputting data, wherein the calculation formula is as follows:
wherein,the function F is a non-linear activation function (e.g., ReLU function),is a sub-full connection layerThreshold data of (a);
(2.4) for the sub full junction layerCarrying out parallel same treatment according to the steps (2.1) to (2.3);
(3) in the backward propagation process of each sub full connection layer, the weight gradient and the threshold gradient of each sub full connection layer are obtained in parallel by adopting a backward propagation method such as fixed stop and the like based on the output data of each sub full connection layer obtained by a forward propagation method such as half stop and the like;
the stop-and-go back propagation method shown in fig. 5 specifically includes:
(3.1) for the sub full junction layerSub-full connection layer on Q training units each time To pairAfter the generated output residual data arrives (i.e. the output residual data is copied from the source training unit to the target training unit), the Q parts of output residual data are taken as sub-full connected layersThe input residual data of (a), is noted as:
(3.2) for the sub full junction layerAccumulating the Q parts of input residual data in the step (3.1) and recording as:the calculation formula is as follows:
(3.3) for the sub full junction layerAccording to the result of the step (3.2), parallel computing is carried outSub-full connection layerOutput residual data of (3), noted as: the calculation formula is as follows:
(3.4) for the sub full junction layerAccording to the result of step (3.1), calculating sub-full connection layer in parallelTo pairThe weight gradient of (a) is recorded as: the calculation formula is as follows:
(3.5) for the sub full junction layerCalculating sub-full connection layer according to the result of the step (3.2) To pairIs recorded as:the calculation formula is as follows:
where V is a unit vector, and the dimension size of V is equal to the size of the batch block in training.
(3.6) for the sub full junction layerRepeating the steps (3.1) to (3.5), and treating each timeQ-part sub-full connection layer pair sub-full connection layer of rear layerGenerated output residual data up to sub-full link layerAll the output residual data of the rear layer are processed;
(3.7) for the sub full junction layerThe same treatment is carried out in parallel according to the steps (3.1) to (3.6).
(4) After one-time forward propagation and backward propagation are finished, the weight data and the threshold data of each sub full connection layer are updated in parallel through the weight gradient and the threshold gradient of each sub full connection layer.
The method for updating the weight data and the threshold data shown in fig. 6 specifically includes:
(4.1) for all sub full junction layersThe weight data are updated in parallel, and the calculation formula is as follows:wherein η represents the learning rate;
(4.2) for all sub full connection layersUpdating the threshold data thereof in parallel, wherein the calculation formula is as follows:
in one embodiment of the invention, a deep neural network model parallel full-connection layer data exchange system is disclosed, which comprises:
a partitioning module for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
the forward propagation module is used for obtaining the output data of each sub full-link layer in parallel by adopting a half-stop equal forward propagation method in the forward propagation process of each sub full-link layer;
the backward propagation module is used for obtaining the weight gradient and the threshold gradient of each sub full-connection layer in parallel by adopting a backward propagation method such as stop and the like based on the output data of each sub full-connection layer obtained by a forward propagation method such as half stop and the like in the backward propagation process of each sub full-connection layer;
and the updating module is used for updating the weight data and the threshold data of each sub full connection layer in parallel by the weight gradient and the threshold gradient of each sub full connection layer after one-time forward propagation and one-time backward propagation are finished.
The specific implementation of each module may refer to the description of the method embodiment, and the embodiment of the present invention will not be repeated.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (5)
1. A deep neural network model parallel full-connection layer data exchange method is characterized by comprising the following steps:
(1) for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
(2) in the forward propagation process of each sub full link layer, the output data of each sub full link layer is obtained by adopting a half-stop equal forward propagation method in parallel;
(3) in the backward propagation process of each sub full connection layer, the weight gradient and the threshold gradient of each sub full connection layer are obtained in parallel by adopting a backward propagation method such as fixed stop and the like based on the output data of each sub full connection layer obtained by a forward propagation method such as half stop and the like;
(4) after one-time forward propagation and backward propagation are finished, the weight data and the threshold data of each sub full connection layer are updated in parallel through the weight gradient and the threshold gradient of each sub full connection layer.
2. The method according to claim 1, wherein step (2) comprises in particular:
(2.1) for each sub-full connection layerIf any sub-full connection layerHas arrived, then the data is represented by the formula:computing sub-full connection layerSub-full connection layerWherein the subscript l denotes the index of the full link layer, the subscripts j and i denote the indexes of the sub-full link layers,representing a fully connected layer of atomsAnd a sub full connection layerThe connection weight of (a) is set,representing a fully connected layer of atomsThe output data of (a) is obtained,representing a fully connected layer of atomsSub-full connection layerThe generated input data;
(2.2) for the sub full junction layerAccording to the result of step (2.1), the following formula:computing sub-full connection layerThe overall input data of (a), wherein,representing a fully connected layer of atomsThe overall input data of (1);
(2.3) for the sub full junction layerAccording to the result of step (2.2), the following formula: computing sub-full connection layerWherein the function F represents a non-linear activation function,is a sub-full connection layerThe threshold value data of (1).
3. The method according to claim 2, wherein step (3) comprises in particular:
(3.1) for each sub-full connection layerSub full link layer on Q training units To the sub full connection layerAfter the generated output residual data arrive, taking the Q output residual data as a sub full link layerIs transported byResidual data, noted as:
(3.2) for the sub full junction layerBy the formula:accumulating the Q input residual error data in the step (3.1);
(3.3) for the sub full junction layerAccording to the result of step (3.2), calculating sub-full connection layers in parallelSub-full connection layerOutput residual data of (3), noted as: the calculation formula is as follows:
(3.4) for the sub full junction layerAccording to the result of step (3.1), calculating sub-full connection layer in parallelSub-full connection layerThe weight gradient of (a) is recorded as: the calculation formula is as follows:
(3.5) for the sub full junction layerCalculating sub-full connection layer according to the result of the step (3.2)Sub-full connection layerIs recorded as:the calculation formula is as follows:v is a unit vector, and the dimension size of V is equal to the size of a batch processing block in training;
(3.6) for the sub full junction layerRepeating the steps (3.1) to (3.5), and treating each timeQ-part sub-full connection layer pair sub-full connection layer of rear layerGenerated output residual data up to sub-full link layerAnd finishing processing all the output residual data of the rear layer.
4. The method according to claim 3, characterized in that step (4) comprises in particular:
(4.1) the formula:updating each sub-full connection layer in parallelWherein η represents a learning rate;
(4.2) the formula:updating each sub-full connection layer in parallel The threshold value data of (1).
5. A deep neural network model parallel full-connection layer data exchange system, comprising:
a partitioning module for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClIs divided into N equal partsDistributing each divided sub full connection layer to N training units respectively at N sub full connection layers, wherein L is the number of the full connection layers;
the forward propagation module is used for obtaining the output data of each sub full-link layer in parallel by adopting a half-stop equal forward propagation method in the forward propagation process of each sub full-link layer;
the backward propagation module is used for obtaining the weight gradient and the threshold gradient of each sub full-connection layer in parallel by adopting a backward propagation method such as stop and the like based on the output data of each sub full-connection layer obtained by a forward propagation method such as half stop and the like in the backward propagation process of each sub full-connection layer;
and the updating module is used for updating the weight data and the threshold data of each sub full connection layer in parallel by the weight gradient and the threshold gradient of each sub full connection layer after one-time forward propagation and one-time backward propagation are finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710191684.1A CN106991474B (en) | 2017-03-28 | 2017-03-28 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710191684.1A CN106991474B (en) | 2017-03-28 | 2017-03-28 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106991474A true CN106991474A (en) | 2017-07-28 |
CN106991474B CN106991474B (en) | 2019-09-24 |
Family
ID=59413391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710191684.1A Active CN106991474B (en) | 2017-03-28 | 2017-03-28 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991474B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408175A (en) * | 2018-09-28 | 2019-03-01 | 北京赛博贝斯数据科技有限责任公司 | Real-time interaction method and system in general high-performance deep learning computing engines |
CN109711358A (en) * | 2018-12-28 | 2019-05-03 | 四川远鉴科技有限公司 | Neural network training method, face identification method and system and storage medium |
CN109976903A (en) * | 2019-02-22 | 2019-07-05 | 华中科技大学 | A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation |
CN112418168A (en) * | 2020-12-10 | 2021-02-26 | 深圳云天励飞技术股份有限公司 | Vehicle identification method, device, system, electronic equipment and storage medium |
WO2022018548A1 (en) * | 2020-07-21 | 2022-01-27 | International Business Machines Corporation | Online training of neural networks |
US11961001B2 (en) | 2017-12-15 | 2024-04-16 | Nvidia Corporation | Parallel forward and backward propagation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
US20150058268A1 (en) * | 2012-01-27 | 2015-02-26 | International Business Machines Corporation | Hierarchical scalable neuromorphic synaptronic system for synaptic and structural plasticity |
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
CN105630882A (en) * | 2015-12-18 | 2016-06-01 | 哈尔滨工业大学深圳研究生院 | Remote sensing data deep learning based offshore pollutant identifying and tracking method |
US20160267380A1 (en) * | 2015-03-13 | 2016-09-15 | Nuance Communications, Inc. | Method and System for Training a Neural Network |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
-
2017
- 2017-03-28 CN CN201710191684.1A patent/CN106991474B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150058268A1 (en) * | 2012-01-27 | 2015-02-26 | International Business Machines Corporation | Hierarchical scalable neuromorphic synaptronic system for synaptic and structural plasticity |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
US20160267380A1 (en) * | 2015-03-13 | 2016-09-15 | Nuance Communications, Inc. | Method and System for Training a Neural Network |
CN105630882A (en) * | 2015-12-18 | 2016-06-01 | 哈尔滨工业大学深圳研究生院 | Remote sensing data deep learning based offshore pollutant identifying and tracking method |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
Non-Patent Citations (2)
Title |
---|
CHENX ETAL.: "Pipelined Back-Propagation for Context-Dependent Deep Neural Networks", 《INTERSPEECH》 * |
王裕民: "多GPU环境下的卷积神经网络并行算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11961001B2 (en) | 2017-12-15 | 2024-04-16 | Nvidia Corporation | Parallel forward and backward propagation |
CN109408175A (en) * | 2018-09-28 | 2019-03-01 | 北京赛博贝斯数据科技有限责任公司 | Real-time interaction method and system in general high-performance deep learning computing engines |
CN109711358A (en) * | 2018-12-28 | 2019-05-03 | 四川远鉴科技有限公司 | Neural network training method, face identification method and system and storage medium |
CN109976903A (en) * | 2019-02-22 | 2019-07-05 | 华中科技大学 | A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation |
US11568268B2 (en) | 2019-02-22 | 2023-01-31 | Huazhong University Of Science And Technology | Deep learning heterogeneous computing method based on layer-wide memory allocation and system thereof |
WO2022018548A1 (en) * | 2020-07-21 | 2022-01-27 | International Business Machines Corporation | Online training of neural networks |
GB2612504A (en) * | 2020-07-21 | 2023-05-03 | Ibm | Online training of neural networks |
CN112418168A (en) * | 2020-12-10 | 2021-02-26 | 深圳云天励飞技术股份有限公司 | Vehicle identification method, device, system, electronic equipment and storage medium |
CN112418168B (en) * | 2020-12-10 | 2024-04-02 | 深圳云天励飞技术股份有限公司 | Vehicle identification method, device, system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106991474B (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991474B (en) | The parallel full articulamentum method for interchanging data of deep neural network model and system | |
EP3540652B1 (en) | Method, device, chip and system for training neural network model | |
CN111242282B (en) | Deep learning model training acceleration method based on end edge cloud cooperation | |
US11568258B2 (en) | Operation method | |
US10540587B2 (en) | Parallelizing the training of convolutional neural networks | |
CN112733967B (en) | Model training method, device, equipment and storage medium for federal learning | |
CN107229966B (en) | Model data updating method, device and system | |
US20210295168A1 (en) | Gradient compression for distributed training | |
KR20180045635A (en) | Device and method to reduce neural network | |
CN111788585B (en) | Training method and system for deep learning model | |
CN108009642A (en) | Distributed machines learning method and system | |
CN113469355B (en) | Multi-model training pipeline in distributed system | |
CN107341542A (en) | Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings | |
US11977972B2 (en) | Residual semi-recurrent neural networks | |
CN111523648B (en) | Neural network pulse synchronization method and system containing clustering topological coupling | |
WO2017167114A1 (en) | Method and device for training model of quasi-alexnet | |
US11843587B2 (en) | Systems and methods for tree-based model inference using multi-party computation | |
EP4320556A1 (en) | Privacy-aware pruning in machine learning | |
Chen et al. | Generative modeling with phase stochastic bridges | |
Naseh et al. | Enabling Intelligent Vehicular Networks Through Distributed Learning in the Non-Terrestrial Networks 6G Vision | |
US20220027796A1 (en) | Hierarchical decentralized distributed deep learning training | |
KR102090109B1 (en) | Learning and inference apparatus and method | |
CN113887740A (en) | Method, device and system for jointly updating model | |
CN112861991A (en) | Learning rate adjusting method for neural network asynchronous training | |
CN108334939A (en) | Convolutional neural networks accelerator based on more FPGA ring communications and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |