CN106991474A - The parallel full articulamentum method for interchanging data of deep neural network model and system - Google Patents

The parallel full articulamentum method for interchanging data of deep neural network model and system Download PDF

Info

Publication number
CN106991474A
CN106991474A CN201710191684.1A CN201710191684A CN106991474A CN 106991474 A CN106991474 A CN 106991474A CN 201710191684 A CN201710191684 A CN 201710191684A CN 106991474 A CN106991474 A CN 106991474A
Authority
CN
China
Prior art keywords
sub
full
layer
connection layer
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710191684.1A
Other languages
Chinese (zh)
Other versions
CN106991474B (en
Inventor
蒋文斌
金海�
张杨松
叶阁焰
马阳
祝简
刘湃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710191684.1A priority Critical patent/CN106991474B/en
Publication of CN106991474A publication Critical patent/CN106991474A/en
Application granted granted Critical
Publication of CN106991474B publication Critical patent/CN106991474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of parallel full articulamentum method for interchanging data of deep neural network model and system, the full articulamentum of deep neural network is evenly dividing on N number of training unit by the number of neuron, a kind of parallel network model of connection layer model complete in deep neural network is formed;During the propagated forward of full articulamentum, using the input data of the propagated forward method to front layer such as partly stopping, part is taken to reach, part is calculated, overall output and the overall processing mode propagated;During the back-propagating of full articulamentum, using stopping to wait back-propagating method to the residual error data of rear layer surely, the processing mode for quantitatively reaching, quantitatively calculating and quantitatively propagate is taken;After the completion of a forward direction with back-propagating, according to required weights gradient and threshold gradient, the weight data and threshold data of each layer are concurrently updated.Can be overlapping by the data communication of full articulamentum and data calculating progress, the convergence of acceleration model on the premise of accuracy is ensured.

Description

Deep neural network model parallel full-connection layer data exchange method and system
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a full-connection layer data exchange method and system for model parallel in a deep neural network.
Background
Deep Neural Network (DNN) is an Artificial Neural Network (ANN) composed of an input layer, a plurality of hidden layers and an output layer, each layer is composed of a plurality of neuron nodes, the neuron nodes of the front layer and the neuron nodes of the rear layer are connected with each other, as shown in fig. 1, all the layers in fig. 1 are on the same training unit, I represents the input layer, H represents the hidden layers (the hidden layers need to be in plurality), O represents the hidden layers, thin lines represent the connection of neurons and neurons, and thick lines represent the connection of components and components (herein, a certain layer). In the neural network model, a fully-Connected Layer (Full-Connected Layer) refers to a Layer in which all nodes are Connected to any one of adjacent layers, and the fully-Connected Layer is denoted by "FC".
With the increase of the training data set, in the model training of the deep neural network, the training parameters (including connection weight parameters and threshold parameters, which are also referred to as bias parameters) of the full connection layer often exceed the memory size of a single training unit (the training unit is an independent computing node, which may be a GPU card or a server node), so that the full connection layer needs to be split into N pieces, each piece is composed of part of neuron nodes and training parameters between the neuron nodes, and the N training units distributed on one or more hosts respectively hold the training parameters and cooperate with each other to complete training, as shown in fig. 2, so that a model parallel training mode of the full connection layer in the deep neural network is formed.
Communication overhead occurs when the input to one neuron comes from the output of a neuron on another training unit, as in FIG. 2, when the training unit GPU2The input of the neuron needs to come from a training unit GPU1The output of the neuron(s) above needs to be copied to the output of the latter, which results in communication overhead. In the propagation method of the deep neural network standard, the computation and the communication are strictly consistent regardless of the forward propagation or the backward propagation, the standard forward propagation (the standard backward propagation is similar to the standard forward propagation) is taken as an explanation below, and the core idea is as follows:
(1) waiting for data: the training unit waits for the arrival of the output data of all source training units (training units that generate data);
(2) and (4) integral output: the training unit calculates the output data of the current layer;
(3) and (3) overall propagation: the training unit propagates the output data to a plurality of target training units (training units that receive the data) as input data to the target training units.
However, the above method has the following drawbacks: because the training units are distributed on one or more hosts, the transmission rates of output data from the source training unit to the target training unit are inconsistent, the target training unit needs to wait for the data of the source training unit with the slowest rate to start the next work (continue to propagate forward), and the communication overhead is increased.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a deep neural network model parallel full-connection layer data exchange method and system, which can overlap data communication and data calculation of a full-connection layer and accelerate the convergence of the model on the premise of ensuring the accuracy. Therefore, the technical problem of high communication overhead in the prior art is solved.
To achieve the above object, according to an aspect of the present invention, there is provided a deep neural network model parallel full-connection layer data exchange method, including:
(1) for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
(2) in the forward propagation process of each sub full link layer, the output data of each sub full link layer is obtained by adopting a half-stop equal forward propagation method in parallel;
(3) in the backward propagation process of each sub full connection layer, the weight gradient and the threshold gradient of each sub full connection layer are obtained in parallel by adopting a backward propagation method such as fixed stop and the like based on the output data of each sub full connection layer obtained by a forward propagation method such as half stop and the like;
(4) after one-time forward propagation and backward propagation are finished, the weight data and the threshold data of each sub full connection layer are updated in parallel through the weight gradient and the threshold gradient of each sub full connection layer.
Preferably, the step (2) specifically comprises:
(2.1) for each sub-full connection layerIf any sub-full connection layer Has arrived, then the data is represented by the formula:computing sub-full connection layerSub-full connection layerWherein the subscript l denotes the index of the full link layer, the subscripts j and i denote the indexes of the sub-full link layers,representing a fully connected layer of atomsAnd a sub full connection layerThe connection weight of (a) is set,representing a fully connected layer of atomsThe output data of (a) is obtained,representing a fully connected layer of atomsSub-full connection layerThe generated input data;
(2.2) for the sub full junction layerAccording to the result of step (2.1), the following formula: computing sub-full connection layerThe overall input data of (a), wherein,representing a fully connected layer of atomsThe overall input data of (1);
(2.3) for the sub full junction layerAccording to the result of step (2.2), the following formula: computing sub-full connection layerWherein the function F represents a non-linear activation function,is a sub-full connection layerThe threshold value data of (1).
Preferably, step (3) specifically comprises:
(3.1) for each sub-full connection layerSub full link layer on Q training units To the sub full connection layerAfter the generated output residual data arrive, taking the Q output residual data as a sub full link layerThe input residual data of (a), is noted as:
(3.2) for the sub full junction layerBy the formula:accumulating the Q input residual error data in the step (3.1);
(3.3) for the sub full junction layerAccording to the result of step (3.2), calculating sub-full connection layers in parallelSub-full connection layerOutput residual data of (3), noted as: the calculation formula is as follows:
(3.4) for the sub full junction layerAccording to the result of step (3.1), calculating sub-full connection layer in parallelSub-full connection layerThe weight gradient of (a) is recorded as: the calculation formula is as follows:
(3.5) for the sub full junction layerCalculating sub-full connection layer according to the result of the step (3.2)Sub-full connection layerIs recorded as:the calculation formula is as follows:v is a unit vector, and the dimension size of V is equal to the size of a batch processing block in training;
(3.6) for the sub full junction layerRepeating the steps (3.1) to (3.5), and treating each timeQ-part sub-full connection layer pair sub-full connection layer of rear layerGenerated output residual data up to sub-full link layerAnd finishing processing all the output residual data of the rear layer.
Preferably, the step (4) specifically comprises:
(4.1) the formula:updating each sub-full connection layer in parallelWherein η represents a learning rate;
(4.2) the formula:updating each sub-full connection layer in parallelThe threshold value data of (1).
According to another aspect of the present invention, there is provided a deep neural network model parallel full-connection layer data exchange system, including:
a partitioning module for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
the forward propagation module is used for obtaining the output data of each sub full-link layer in parallel by adopting a half-stop equal forward propagation method in the forward propagation process of each sub full-link layer;
the backward propagation module is used for obtaining the weight gradient and the threshold gradient of each sub full-connection layer in parallel by adopting a backward propagation method such as stop and the like based on the output data of each sub full-connection layer obtained by a forward propagation method such as half stop and the like in the backward propagation process of each sub full-connection layer;
and the updating module is used for updating the weight data and the threshold data of each sub full connection layer in parallel by the weight gradient and the threshold gradient of each sub full connection layer after one-time forward propagation and one-time backward propagation are finished.
Generally, compared with the prior art, the above technical solutions conceived by the present invention mainly have the following technical advantages:
(1) the calculation parallelism is high: each training unit processes the current data in parallel;
(2) the communication overhead is small: the half stop equal forward propagation method and the fixed stop equal backward propagation method both maximize the calculation and communication time in the training process of the overlapping deep neural network, and reduce the communication overhead in the training.
Drawings
FIG. 1 is a schematic diagram of a deep neural network architecture in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a fully-connected layer structure of model parallelism in a deep neural network in an embodiment of the present invention;
FIG. 3 is a schematic flow chart of the overall process in an embodiment of the invention;
FIG. 4 is a flow chart of a semi-stop equal forward propagation method in an embodiment of the present invention;
FIG. 5 is a flow chart illustrating a stop-and-wait back propagation method in an embodiment of the present invention;
fig. 6 is a flowchart illustrating a method for updating weight data and threshold data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention consists of two parts, namely a half stop equal forward propagation method and a fixed stop equal backward propagation method, wherein the core idea of the half stop equal forward propagation method is as follows:
(1) and partial calculation: the training unit calculates the output Data (ID) of the front layer which has already arrived;
(2) and (4) integral output: the training unit combines all calculation results in the step (1) as input Data of the training unit, and calculates Output Data (OD) for the input Data;
(3) and (3) overall propagation: the training unit propagates the output data to a plurality of target training units as input data for the target training units.
The core idea of the stop-and-wait back propagation method is as follows:
(1) and (3) quantitative calculation: the training unit calculates residual data (also called error data, and expressed by "" in the invention) output by a rear layer after reaching Q parts (Q belongs to [1, N ], Q is a constant set by a user, and N is the number of the training units) each time;
(2) quantitative transmission: the training unit transmits the calculation result in the step (1) to a plurality of target training units as input residual data (I) of the target training units;
(3) and (3) repeating the step (1) and the step (2) until the N output residual data (O) of the rear layer are processed.
The overall thought of the invention is that in the parallel training process of the model of the deep neural network, a semi-stop equal forward propagation method is used for replacing a standard forward propagation method, a fixed stop equal backward propagation method is used for replacing a standard backward propagation method, and the calculation and communication time in the training process is overlapped, so that the training communication overhead is reduced.
Fig. 3 is a schematic general flow chart of the method in the embodiment of the present invention, and the method shown in fig. 3 includes:
(1) for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
wherein a full connection layer FCl,l∈[1,L]Dividing the number of the neurons into N equal parts to obtain N sub full-connection layers which are respectively recorded as:… andand distribute them to N training units, wherein the training unit is an independent computing node, which can be a GPU card or a server node, for other full connection layersThe same process is also performed, and a network model with parallel fully-connected layer models in a deep neural network is formed, as shown in fig. 2, each fully-connected layer is divided into N parts and held by N training units respectively, N represents the number of training units, that is, the number of divisions of each fully-connected layer, L represents the number of fully-connected layers, thin lines represent connection of neurons and neurons, and thick lines represent connection of components and components (herein, a certain part of a certain layer).
(2) In the forward propagation process of each sub full link layer, the output data of each sub full link layer is obtained by adopting a half-stop equal forward propagation method in parallel;
the half-stop equal forward propagation method shown in fig. 4 specifically includes:
(2.1) for the sub full junction layerIf any sub-full connection layer Has arrived, the calculation is performedSub-full connection layerThe calculation formula of the input data is as follows:
the subscript l in the formula represents the index of the full link layer, and the subscripts j and i represent the indexes of the sub full link layers, that is, the indexes of the training units.Representing a fully connected layer of atomsAnd a sub full connection layerThe connection weight value is set to be a connection weight value,representing a fully connected layer of atomsThe output data of (a) is obtained,representing a fully connected layer of atomsSub-full connection layerThe generated input data;
(2.2) for the sub full junction layerAccording to the result of step (2.1), calculatingThe overall input data has the calculation formula as follows:
wherein,representing a fully connected layer of atomsThe overall input data of (1);
(2.3) for the sub full junction layerAccording to the result of step (2.2), calculatingAnd finally outputting data, wherein the calculation formula is as follows:
wherein,the function F is a non-linear activation function (e.g., ReLU function),is a sub-full connection layerThreshold data of (a);
(2.4) for the sub full junction layerCarrying out parallel same treatment according to the steps (2.1) to (2.3);
(3) in the backward propagation process of each sub full connection layer, the weight gradient and the threshold gradient of each sub full connection layer are obtained in parallel by adopting a backward propagation method such as fixed stop and the like based on the output data of each sub full connection layer obtained by a forward propagation method such as half stop and the like;
the stop-and-go back propagation method shown in fig. 5 specifically includes:
(3.1) for the sub full junction layerSub-full connection layer on Q training units each time To pairAfter the generated output residual data arrives (i.e. the output residual data is copied from the source training unit to the target training unit), the Q parts of output residual data are taken as sub-full connected layersThe input residual data of (a), is noted as:
(3.2) for the sub full junction layerAccumulating the Q parts of input residual data in the step (3.1) and recording as:the calculation formula is as follows:
(3.3) for the sub full junction layerAccording to the result of the step (3.2), parallel computing is carried outSub-full connection layerOutput residual data of (3), noted as: the calculation formula is as follows:
(3.4) for the sub full junction layerAccording to the result of step (3.1), calculating sub-full connection layer in parallelTo pairThe weight gradient of (a) is recorded as: the calculation formula is as follows:
(3.5) for the sub full junction layerCalculating sub-full connection layer according to the result of the step (3.2) To pairIs recorded as:the calculation formula is as follows:
where V is a unit vector, and the dimension size of V is equal to the size of the batch block in training.
(3.6) for the sub full junction layerRepeating the steps (3.1) to (3.5), and treating each timeQ-part sub-full connection layer pair sub-full connection layer of rear layerGenerated output residual data up to sub-full link layerAll the output residual data of the rear layer are processed;
(3.7) for the sub full junction layerThe same treatment is carried out in parallel according to the steps (3.1) to (3.6).
(4) After one-time forward propagation and backward propagation are finished, the weight data and the threshold data of each sub full connection layer are updated in parallel through the weight gradient and the threshold gradient of each sub full connection layer.
The method for updating the weight data and the threshold data shown in fig. 6 specifically includes:
(4.1) for all sub full junction layersThe weight data are updated in parallel, and the calculation formula is as follows:wherein η represents the learning rate;
(4.2) for all sub full connection layersUpdating the threshold data thereof in parallel, wherein the calculation formula is as follows:
in one embodiment of the invention, a deep neural network model parallel full-connection layer data exchange system is disclosed, which comprises:
a partitioning module for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
the forward propagation module is used for obtaining the output data of each sub full-link layer in parallel by adopting a half-stop equal forward propagation method in the forward propagation process of each sub full-link layer;
the backward propagation module is used for obtaining the weight gradient and the threshold gradient of each sub full-connection layer in parallel by adopting a backward propagation method such as stop and the like based on the output data of each sub full-connection layer obtained by a forward propagation method such as half stop and the like in the backward propagation process of each sub full-connection layer;
and the updating module is used for updating the weight data and the threshold data of each sub full connection layer in parallel by the weight gradient and the threshold gradient of each sub full connection layer after one-time forward propagation and one-time backward propagation are finished.
The specific implementation of each module may refer to the description of the method embodiment, and the embodiment of the present invention will not be repeated.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. A deep neural network model parallel full-connection layer data exchange method is characterized by comprising the following steps:
(1) for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClDividing the N sub full connection layers into N equal parts to obtain N sub full connection layers, and distributing the divided sub full connection layers to N training units respectively, wherein L is the number of the full connection layers;
(2) in the forward propagation process of each sub full link layer, the output data of each sub full link layer is obtained by adopting a half-stop equal forward propagation method in parallel;
(3) in the backward propagation process of each sub full connection layer, the weight gradient and the threshold gradient of each sub full connection layer are obtained in parallel by adopting a backward propagation method such as fixed stop and the like based on the output data of each sub full connection layer obtained by a forward propagation method such as half stop and the like;
(4) after one-time forward propagation and backward propagation are finished, the weight data and the threshold data of each sub full connection layer are updated in parallel through the weight gradient and the threshold gradient of each sub full connection layer.
2. The method according to claim 1, wherein step (2) comprises in particular:
(2.1) for each sub-full connection layerIf any sub-full connection layerHas arrived, then the data is represented by the formula:computing sub-full connection layerSub-full connection layerWherein the subscript l denotes the index of the full link layer, the subscripts j and i denote the indexes of the sub-full link layers,representing a fully connected layer of atomsAnd a sub full connection layerThe connection weight of (a) is set,representing a fully connected layer of atomsThe output data of (a) is obtained,representing a fully connected layer of atomsSub-full connection layerThe generated input data;
(2.2) for the sub full junction layerAccording to the result of step (2.1), the following formula:computing sub-full connection layerThe overall input data of (a), wherein,representing a fully connected layer of atomsThe overall input data of (1);
(2.3) for the sub full junction layerAccording to the result of step (2.2), the following formula: computing sub-full connection layerWherein the function F represents a non-linear activation function,is a sub-full connection layerThe threshold value data of (1).
3. The method according to claim 2, wherein step (3) comprises in particular:
(3.1) for each sub-full connection layerSub full link layer on Q training units To the sub full connection layerAfter the generated output residual data arrive, taking the Q output residual data as a sub full link layerIs transported byResidual data, noted as:
(3.2) for the sub full junction layerBy the formula:accumulating the Q input residual error data in the step (3.1);
(3.3) for the sub full junction layerAccording to the result of step (3.2), calculating sub-full connection layers in parallelSub-full connection layerOutput residual data of (3), noted as: the calculation formula is as follows:
(3.4) for the sub full junction layerAccording to the result of step (3.1), calculating sub-full connection layer in parallelSub-full connection layerThe weight gradient of (a) is recorded as: the calculation formula is as follows:
(3.5) for the sub full junction layerCalculating sub-full connection layer according to the result of the step (3.2)Sub-full connection layerIs recorded as:the calculation formula is as follows:v is a unit vector, and the dimension size of V is equal to the size of a batch processing block in training;
(3.6) for the sub full junction layerRepeating the steps (3.1) to (3.5), and treating each timeQ-part sub-full connection layer pair sub-full connection layer of rear layerGenerated output residual data up to sub-full link layerAnd finishing processing all the output residual data of the rear layer.
4. The method according to claim 3, characterized in that step (4) comprises in particular:
(4.1) the formula:updating each sub-full connection layer in parallelWherein η represents a learning rate;
(4.2) the formula:updating each sub-full connection layer in parallel The threshold value data of (1).
5. A deep neural network model parallel full-connection layer data exchange system, comprising:
a partitioning module for each full connection layer FCl,l∈[1,L]According to FClNumber of neurons in FClIs divided into N equal partsDistributing each divided sub full connection layer to N training units respectively at N sub full connection layers, wherein L is the number of the full connection layers;
the forward propagation module is used for obtaining the output data of each sub full-link layer in parallel by adopting a half-stop equal forward propagation method in the forward propagation process of each sub full-link layer;
the backward propagation module is used for obtaining the weight gradient and the threshold gradient of each sub full-connection layer in parallel by adopting a backward propagation method such as stop and the like based on the output data of each sub full-connection layer obtained by a forward propagation method such as half stop and the like in the backward propagation process of each sub full-connection layer;
and the updating module is used for updating the weight data and the threshold data of each sub full connection layer in parallel by the weight gradient and the threshold gradient of each sub full connection layer after one-time forward propagation and one-time backward propagation are finished.
CN201710191684.1A 2017-03-28 2017-03-28 The parallel full articulamentum method for interchanging data of deep neural network model and system Active CN106991474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710191684.1A CN106991474B (en) 2017-03-28 2017-03-28 The parallel full articulamentum method for interchanging data of deep neural network model and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710191684.1A CN106991474B (en) 2017-03-28 2017-03-28 The parallel full articulamentum method for interchanging data of deep neural network model and system

Publications (2)

Publication Number Publication Date
CN106991474A true CN106991474A (en) 2017-07-28
CN106991474B CN106991474B (en) 2019-09-24

Family

ID=59413391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710191684.1A Active CN106991474B (en) 2017-03-28 2017-03-28 The parallel full articulamentum method for interchanging data of deep neural network model and system

Country Status (1)

Country Link
CN (1) CN106991474B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408175A (en) * 2018-09-28 2019-03-01 北京赛博贝斯数据科技有限责任公司 Real-time interaction method and system in general high-performance deep learning computing engines
CN109711358A (en) * 2018-12-28 2019-05-03 四川远鉴科技有限公司 Neural network training method, face identification method and system and storage medium
CN109976903A (en) * 2019-02-22 2019-07-05 华中科技大学 A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation
CN112418168A (en) * 2020-12-10 2021-02-26 深圳云天励飞技术股份有限公司 Vehicle identification method, device, system, electronic equipment and storage medium
WO2022018548A1 (en) * 2020-07-21 2022-01-27 International Business Machines Corporation Online training of neural networks
US11961001B2 (en) 2017-12-15 2024-04-16 Nvidia Corporation Parallel forward and backward propagation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
US20150058268A1 (en) * 2012-01-27 2015-02-26 International Business Machines Corporation Hierarchical scalable neuromorphic synaptronic system for synaptic and structural plasticity
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN105630882A (en) * 2015-12-18 2016-06-01 哈尔滨工业大学深圳研究生院 Remote sensing data deep learning based offshore pollutant identifying and tracking method
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150058268A1 (en) * 2012-01-27 2015-02-26 International Business Machines Corporation Hierarchical scalable neuromorphic synaptronic system for synaptic and structural plasticity
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network
CN105630882A (en) * 2015-12-18 2016-06-01 哈尔滨工业大学深圳研究生院 Remote sensing data deep learning based offshore pollutant identifying and tracking method
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENX ETAL.: "Pipelined Back-Propagation for Context-Dependent Deep Neural Networks", 《INTERSPEECH》 *
王裕民: "多GPU环境下的卷积神经网络并行算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11961001B2 (en) 2017-12-15 2024-04-16 Nvidia Corporation Parallel forward and backward propagation
CN109408175A (en) * 2018-09-28 2019-03-01 北京赛博贝斯数据科技有限责任公司 Real-time interaction method and system in general high-performance deep learning computing engines
CN109711358A (en) * 2018-12-28 2019-05-03 四川远鉴科技有限公司 Neural network training method, face identification method and system and storage medium
CN109976903A (en) * 2019-02-22 2019-07-05 华中科技大学 A kind of deep learning Heterogeneous Computing method and system based on slice width Memory Allocation
US11568268B2 (en) 2019-02-22 2023-01-31 Huazhong University Of Science And Technology Deep learning heterogeneous computing method based on layer-wide memory allocation and system thereof
WO2022018548A1 (en) * 2020-07-21 2022-01-27 International Business Machines Corporation Online training of neural networks
GB2612504A (en) * 2020-07-21 2023-05-03 Ibm Online training of neural networks
CN112418168A (en) * 2020-12-10 2021-02-26 深圳云天励飞技术股份有限公司 Vehicle identification method, device, system, electronic equipment and storage medium
CN112418168B (en) * 2020-12-10 2024-04-02 深圳云天励飞技术股份有限公司 Vehicle identification method, device, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106991474B (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN106991474B (en) The parallel full articulamentum method for interchanging data of deep neural network model and system
EP3540652B1 (en) Method, device, chip and system for training neural network model
CN111242282B (en) Deep learning model training acceleration method based on end edge cloud cooperation
US11568258B2 (en) Operation method
US10540587B2 (en) Parallelizing the training of convolutional neural networks
CN112733967B (en) Model training method, device, equipment and storage medium for federal learning
CN107229966B (en) Model data updating method, device and system
US20210295168A1 (en) Gradient compression for distributed training
KR20180045635A (en) Device and method to reduce neural network
CN111788585B (en) Training method and system for deep learning model
CN108009642A (en) Distributed machines learning method and system
CN113469355B (en) Multi-model training pipeline in distributed system
CN107341542A (en) Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings
US11977972B2 (en) Residual semi-recurrent neural networks
CN111523648B (en) Neural network pulse synchronization method and system containing clustering topological coupling
WO2017167114A1 (en) Method and device for training model of quasi-alexnet
US11843587B2 (en) Systems and methods for tree-based model inference using multi-party computation
EP4320556A1 (en) Privacy-aware pruning in machine learning
Chen et al. Generative modeling with phase stochastic bridges
Naseh et al. Enabling Intelligent Vehicular Networks Through Distributed Learning in the Non-Terrestrial Networks 6G Vision
US20220027796A1 (en) Hierarchical decentralized distributed deep learning training
KR102090109B1 (en) Learning and inference apparatus and method
CN113887740A (en) Method, device and system for jointly updating model
CN112861991A (en) Learning rate adjusting method for neural network asynchronous training
CN108334939A (en) Convolutional neural networks accelerator based on more FPGA ring communications and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant