CN111985603A - Method for training sparse connection neural network - Google Patents

Method for training sparse connection neural network Download PDF

Info

Publication number
CN111985603A
CN111985603A CN202010123340.9A CN202010123340A CN111985603A CN 111985603 A CN111985603 A CN 111985603A CN 202010123340 A CN202010123340 A CN 202010123340A CN 111985603 A CN111985603 A CN 111985603A
Authority
CN
China
Prior art keywords
weight
connectivity
variable
mask
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010123340.9A
Other languages
Chinese (zh)
Inventor
唐志敏
谢必克
朱逸煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kneron Inc
Kneron Taiwan Co Ltd
Original Assignee
Kneron Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/746,941 external-priority patent/US20200372363A1/en
Application filed by Kneron Inc filed Critical Kneron Inc
Publication of CN111985603A publication Critical patent/CN111985603A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a method for training a sparse connection neural network, which comprises the step of decomposing weight into a product of a weight variable and a binary mask when the neural network is trained, wherein the binary mask is obtained by a mask variable through a unit step function. The elements in the binary mask represent whether the weights of the corresponding positions have connection, 0 represents no connection, and 1 represents connection. If the majority of elements of the binary mask are 0, then the training results in a sparsely connected neural network. The number of weights with connections, i.e. the number of elements within the binary mask is 1, is taken as one term in the objective function. During training, the value of the mask variable is gradually attenuated by adjusting the weight variable and the mask variable according to the objective function, so as to ensure the sparsity of the binary mask.

Description

Method for training sparse connection neural network
Technical Field
The present invention relates to artificial neural networks, and in particular to neural networks for training sparse connections.
Background
An artificial neural network is a network that includes a plurality of processing units arranged in multiple layers. The neural network trained by the general neural network training method is often densely connected (densely connected), i.e. all weights are non-0. However, such network architectures are typically complex, require significant memory resources and power consumption, and often have over-fitting (over-fitting) problems. The weight sparse neural network can be obtained by using a pruning (pruning) mode. Pruning is to set a weight having a small absolute value to 0, but the magnitude of the absolute value of the weight does not represent the importance of connection, and it is difficult to obtain an optimal connection method.
Disclosure of Invention
The embodiment of the invention provides a method for training a sparsely connected neural network. The specific method comprises the following steps: the weights are decomposed during training of the neural network into products of weight variables and binary masks (0/1) which are derived from mask variables by a unit step function. The element in the binary mask represents whether the weight of the corresponding position has a connection, 0 represents no connection, and 1 represents a connection. If most elements of the binary mask are 0, then the training results in a sparse connected neural network. We take as one term in the objective function the number of weights with connections, i.e. the number of elements inside the binary mask is 1. The training process adjusts the weight variable and the mask variable according to the objective function. The value of the mask variable is gradually attenuated during training, so that the binary mask is ensured to be sparse. Since the mask variables are determined by the objective function, only a few important weight-corresponding binary mask elements are 1.
Therefore, the artificial neural network with sparse connection, simple structure and correct output prediction is generated, and the generated sparse connection structure of the artificial neural network can obviously reduce the operation complexity, the memory requirement and the power consumption.
Drawings
Fig. 1 is a calculation diagram of an artificial neural network according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of the convolutional layer of the artificial neural network in fig. 1.
Fig. 3 is a flowchart of a training method of the artificial neural network in fig. 1.
FIG. 4 is an embodiment operational network for constructing the artificial neural network of FIG. 1.
Reference numerals:
1: artificial neural network
300: training method
S302 to S306: step (ii) of
4: operation network
402: processor with a memory having a plurality of memory cells
404: programming memory
406: parameter memory
408: output interface
Lyr (1) to Lyr (j): layer(s)
Figure BDA0002393656480000021
To
Figure BDA0002393656480000022
Connection of
Figure BDA0002393656480000023
To
Figure BDA0002393656480000024
Processing node
m: connective mask
Figure BDA0002393656480000025
Variable of connectivity
w: weight of
Figure BDA0002393656480000026
Weight variable
Figure BDA0002393656480000027
To
Figure BDA0002393656480000028
x: inputting data
Figure BDA0002393656480000029
To
Figure BDA00023936564800000210
y: outputting the estimated value
Y (1) to Y (| NJ |): target value
*: convolution operation
☉: element-to-element multiplication
Detailed Description
Fig. 1 is a calculation diagram of an artificial neural network 1 according to an embodiment of the present invention. The artificial neural network 1 is a fully connected neural network (fully connected neural network), and the present invention is applicable to various types of neural networks such as a convolutional neural network (convolutional neural network). The artificial neural network 1 being responsive to input data
Figure BDA0002393656480000031
To
Figure BDA0002393656480000032
To generate an output estimate
Figure BDA0002393656480000033
To
Figure BDA0002393656480000034
Inputting data
Figure BDA0002393656480000035
To
Figure BDA0002393656480000036
May be a current level, a voltage level, a real signal, a complex signal, an analog signal or a digital signal. For example, input data
Figure BDA0002393656480000037
To
Figure BDA0002393656480000038
May be a gray-scale value of the pixel and may be obtained by an input device, such as a mobile phone, tablet computer, or digital camera. Outputting the estimated value
Figure BDA0002393656480000039
To
Figure BDA00023936564800000310
The probability of various classification results of the artificial neural network 1 can be represented. For example, the estimated value is output
Figure BDA00023936564800000311
To
Figure BDA00023936564800000312
May be the probability of multiple objects being identified from the image. A set of input data
Figure BDA00023936564800000313
To
Figure BDA00023936564800000314
May be referred to as an input data set. The artificial neural network 1 may be trained using sets of input data and respective sets of target values. In some embodiments, the input data set may be divided into a plurality of mini-batches during training. For example, 32,000 input datasets may be divided into 1,000 small batches, each having 32 input datasets.
The artificial neural network 1 may comprise layers Lyr (1) to Lyr (J), J being a positive integer greater than 1. The layer Lyr (1) may beReferred to as the input layer, layers Lyr (J) may be referred to as output layers, and layers Lyr (2) through Lyr (J-1) may be referred to as hidden layers. Each layer Lyr (j) may include multiple processing nodes connected by a connection
Figure BDA00023936564800000315
To
Figure BDA00023936564800000316
A plurality of processing nodes coupled in a previous layer Lyr (J-1), J being a layer index between 2 and J, | Cj | being the total number of connections between the layer Lyr (J) and the previous layer Lyr (J-1). The input layer Lyr (1) may comprise a processing node
Figure BDA00023936564800000317
To
Figure BDA00023936564800000318
Where the superscript denotes a layer index, the subscript denotes a node index, and | N1| is the total number of processing nodes of layer Lyr (1). Processing node
Figure BDA00023936564800000319
To
Figure BDA00023936564800000320
Can receive input data respectively
Figure BDA00023936564800000321
To
Figure BDA00023936564800000322
Each of the hidden layers Lyr (2) through Lyr (J-1) Lyr (J) may include a processing node
Figure BDA00023936564800000323
To
Figure BDA00023936564800000324
Where | Nj | is the total number of processing nodes of the hidden layer Lyr (j). The output layer Lyr (J) may comprise processing nodes
Figure BDA00023936564800000325
To
Figure BDA00023936564800000326
Where | NJ | is the total number of processing nodes of the output layer Lyr (J). Processing node
Figure BDA00023936564800000327
To
Figure BDA00023936564800000328
Can generate output estimated values respectively
Figure BDA00023936564800000329
To
Figure BDA00023936564800000330
Each processing node in the level Lyr (j) may be coupled to one or more processing nodes in the previous level Lyr (j-1) via its connections. Each connection may be associated with a weight, and the processing node may compute a weighted sum of the input data from one or more processing nodes in the previous layer Lyr (j-1). Connections associated with larger weights are more influential than connections associated with smaller weights when generating the weighted sum. When the weight value is 0, the connection related to the weight can be regarded as being removed from the artificial neural network 1, thereby achieving network connection sparsity and reducing the computational complexity, power consumption and operation cost. The artificial neural network 1 may be trained to produce an optimized sparse network configuration to use a small or minimal number of connections
Figure BDA0002393656480000041
To
Figure BDA0002393656480000042
To achieve output estimation values approximately matching respective target values Y (1) to Y (| NJ |)
Figure BDA0002393656480000043
To
Figure BDA0002393656480000044
The method can be applied to different network types, such as fully-connected neural networks or convolutional neural networks. In the calculation, a fully connected layer in the fully connected neural network can be equivalently converted into a convolutional layer, the size of an input feature map (feature map) is 1 × 1 (layer 1 in fig. 1 is 1 × N1), the size of a convolutional kernel (convolutional kernel) is 1 × 1 (layer 1 in fig. 1 is 1 × 1N 1N 2), and N1 and N2 are positive integers. The training method for the sparse connection network is described in fig. 2 in the form of convolutional layers. Fig. 2 shows a convolutional layer, which may be transformed from one of the layers Lyr (2) to Lyr (j) of the artificial neural network 1. The convolutional layer may be coupled to the previous convolutional layer via a connection. The convolutional layer may receive input data x from a previous convolutional layer and perform a convolution operation on the input data x and the weight w to calculate an output estimation value y, as expressed by equation (1):
y ═ w x formula (1)
The input data x may have a size of (1x 1). The weight w may be referred to as a convolution kernel and may have a size of (1x 1). "+" may denote a convolution operation. The output estimate y may be sent to subsequent convolutional layers as its input data to calculate subsequent output estimates. The weight w may be reparameterized to obtain a weight variable
Figure BDA0002393656480000045
And a connectivity mask m, as expressed by equation (2):
Figure BDA0002393656480000046
the connectivity mask m may be binary data representing connectivity of a connection, where 1 represents having a connection and 0 represents not having a connection. Weight variable
Figure BDA0002393656480000047
May indicate the strength of the connection. "☉" may represent an element-to-element (element-wise) multiplication. The connectivity mask m can be varied in number by varying the connectivity
Figure BDA0002393656480000048
Performing a unit ladder operation H (-) derivation, as expressed by equation (3):
Figure BDA0002393656480000049
the convolutional layer can be operated according to unit ladder H (-) to the connectivity variable
Figure BDA00023936564800000410
Binarization is performed to produce a connectivity mask m. By parameterizing the weight w, the connectivity and strength of the connection can be adjusted by adjusting the connectivity variables, respectively
Figure BDA00023936564800000411
And weight variable
Figure BDA00023936564800000412
And training is performed. If the connectivity is variable
Figure BDA00023936564800000413
Less than or equal to 0, weight variable
Figure BDA00023936564800000414
May be masked by 0 to generate a 0 weight w if the connectivity variable is
Figure BDA00023936564800000415
Over 0, weight variable
Figure BDA00023936564800000416
May be set to the weight w.
In the artificial neural network 1, connections are made
Figure BDA00023936564800000417
To
Figure BDA00023936564800000418
Variable number of variables capable of being connected respectively
Figure BDA00023936564800000419
To
Figure BDA00023936564800000420
And weight variable
Figure BDA00023936564800000421
To
Figure BDA00023936564800000422
And (4) correlating. Variable of connectivity
Figure BDA00023936564800000423
To
Figure BDA00023936564800000424
And weight variable
Figure BDA00023936564800000425
To
Figure BDA00023936564800000426
Training to reduce connections based on an objective function
Figure BDA00023936564800000427
To
Figure BDA00023936564800000428
While reducing the performance loss of the artificial neural network 1. Connection of
Figure BDA00023936564800000429
To
Figure BDA00023936564800000430
Can be determined by summing all connectivity masks
Figure BDA0002393656480000051
To
Figure BDA0002393656480000052
And then calculated. The performance loss may represent an output estimate
Figure BDA0002393656480000053
To
Figure BDA0002393656480000054
The difference from the respective target values Y (1) to Y (| NJ |), and can be calculated in the form of cross entropy. The objective function L can be represented by equation (4):
Figure BDA0002393656480000055
where CE is cross entropy (cross entropy);
λ 1 is the connection attenuation coefficient;
λ 2 is a weight attenuation coefficient;
j is the layer index;
i is a mask index or a weight index;
Figure BDA0002393656480000056
an ith connectivity mask for a jth layer;
| Cj | is the total number of connections for layer j; and
Figure BDA0002393656480000057
is the ith weight variable of the jth layer.
The objective function L may include an output estimate
Figure BDA0002393656480000058
To
Figure BDA0002393656480000059
And cross-entropy CE between respective target values Y (1) to Y (| NJ |), connected
Figure BDA00023936564800000510
To
Figure BDA00023936564800000511
L0 regularization terms of the total number of (c), and connection
Figure BDA00023936564800000512
To
Figure BDA00023936564800000513
Associated weight variable
Figure BDA00023936564800000514
To
Figure BDA00023936564800000515
L2 regularization term. In some embodiments, the estimate is output
Figure BDA00023936564800000516
To
Figure BDA00023936564800000517
And sum of squared errors (sum of squared errors) between respective target values Y (1) to Y (| NJ |) may be substituted for the cross entropy in the target function L. The L0 regularization term may be the connection attenuation factor λ 1 and the connectivity mask
Figure BDA00023936564800000518
To
Figure BDA00023936564800000519
The product of the sums of (a) and (b). The L2 regularization term may be a weight attenuation factor λ 2 and a weight variable
Figure BDA00023936564800000520
To
Figure BDA00023936564800000521
The product of the sums of (a) and (b). In some embodiments, the L2 regularization term may be removed by the objective function L. The artificial neural network 1 may be trained to minimize the output result of the objective function L. Thus, the L0 regularization term may suppress large numbers of connections, and the L2 regularization term may suppress large weight variables
Figure BDA00023936564800000522
To
Figure BDA00023936564800000523
The larger the connection attenuation coefficient λ 1 is, the more sparse the artificial neural network 1 is. The connection attenuation coefficient lambda 1 can be set to be large constant for shielding the connectivity
Figure BDA00023936564800000524
To
Figure BDA00023936564800000525
Push to 0, connect variable
Figure BDA00023936564800000526
To
Figure BDA00023936564800000527
Push to the negative direction and create a sparse connection structure for the artificial neural network 1. Only when connected
Figure BDA00023936564800000528
When it is important to reduce cross entropy CE, and connection
Figure BDA00023936564800000529
Related connectivity mask
Figure BDA00023936564800000530
Will remain at 1. In this way, a balance between reduced cross-entropy CE and reduced total number of connections is achieved, resulting in a sparse connection structure while providing output estimates that substantially match the target values Y (1) through Y (| NJ |)
Figure BDA00023936564800000531
To
Figure BDA00023936564800000532
Similarly, the connection attenuation factor λ 2 can be set to be large constant to reduce the weight variable
Figure BDA00023936564800000533
To
Figure BDA0002393656480000061
At the same time the cross entropy CE ensures that important weight variables remain in the artificial neural network 1, resulting in a simple and accurate model of the artificial neural network 1.
In training connectivity variables
Figure BDA0002393656480000062
To
Figure BDA0002393656480000063
Input data at regular intervals
Figure BDA0002393656480000064
To
Figure BDA0002393656480000065
May be fed into input layer Lyr (1) and forward propagated from layer Lyr (1) to layer Lyr (J) to produce an output estimate
Figure BDA0002393656480000066
To
Figure BDA0002393656480000067
Outputting the estimated value
Figure BDA0002393656480000068
To
Figure BDA0002393656480000069
And their respective target values Y (1) through Y (| NJ |) may be calculated and propagated back from the layer Lyr (J) to Lyr (2) to calculate the objective function L versus connectivity variable
Figure BDA00023936564800000610
To
Figure BDA00023936564800000611
The slope of the connectivity variable, and then according to the connectionVariable of nature
Figure BDA00023936564800000612
To
Figure BDA00023936564800000613
Slope adjusting the connectivity variable
Figure BDA00023936564800000614
To
Figure BDA00023936564800000615
Thereby reducing the connection
Figure BDA00023936564800000616
To
Figure BDA00023936564800000617
While reducing the performance loss of the artificial neural network 1. In particular, the connectivity variables
Figure BDA00023936564800000618
Can be continuously adjusted until the corresponding connectivity variable slope
Figure BDA00023936564800000619
Up to 0 to find the local minimum of the cross entropy CE. However, according to the derivative chain law, the slope of the connectivity variable
Figure BDA00023936564800000620
The calculation of (2) involves the differentiation of the unit step function in equation (3), and the differentiation of the unit step function is coupled to almost all the connected variables
Figure BDA00023936564800000621
Is all 0, resulting in a slope of the connectivity variable
Figure BDA00023936564800000622
Is 0 and the training procedure is terminated, and results in a connectivity variable
Figure BDA00023936564800000623
And not updated. To let connectivity variables vary during the training procedureMaintaining a trainable form, unit step functions are skipped and successive variable slopes are
Figure BDA00023936564800000625
Redefinable as the connectivity mask slope of the objective function L versus the connectivity mask m
Figure BDA00023936564800000626
Can be represented by equation (5):
Figure BDA00023936564800000627
referring to FIG. 2, a connectivity mask m and a connectivity variable
Figure BDA00023936564800000628
The dashed line in between indicates that the unit step function is skipped in the reverse propagation. Variable of connectivity
Figure BDA00023936564800000629
May mask slopes in accordance with connectivity
Figure BDA00023936564800000630
And (6) updating. In some embodiments, the connectivity mask slope
Figure BDA00023936564800000631
Can be obtained by corresponding to the weight slope
Figure BDA00023936564800000632
And corresponding weight variable
Figure BDA00023936564800000633
The element-to-element multiplication of (c) results as shown in equation (5).In this way, when a connection is determined to be not important to reducing cross-entropy CE, the connection can be morphed
Figure BDA00023936564800000634
Update from positive to negative and update the connectivity mask from 1 to 0. When it is determined that a connection is important to reduce cross entropy CE, the connection can be modified
Figure BDA00023936564800000635
Update from negative to positive and update the connectivity mask from 0 to 1. In some embodiments, each small batch of input data sets may be input into the artificial neural network 1 to generate multiple sets of output estimates
Figure BDA00023936564800000636
To
Figure BDA0002393656480000071
Multiple sets of output estimates
Figure BDA0002393656480000072
To
Figure BDA0002393656480000073
Can be calculated, and a connectivity variable
Figure BDA0002393656480000074
To
Figure BDA0002393656480000075
Training may be based on the inverse propagation of the average error. In some embodiments, to avoid slopes
Figure BDA0002393656480000076
And weight variable
Figure BDA0002393656480000077
In a different range of the gradient of the connectivity variable
Figure BDA0002393656480000078
Or connectivity mask slope
Figure BDA0002393656480000079
The input data set for each small batch may be normalized to a standard deviation of 1 (normalized).
Similarly, in training the weight variables
Figure BDA00023936564800000710
To
Figure BDA00023936564800000711
Calculating the weight variable of the objective function L by inverse propagation of the error
Figure BDA00023936564800000712
To
Figure BDA00023936564800000713
And then adjusting the weight variable according to the slope of the weight variable
Figure BDA00023936564800000714
To
Figure BDA00023936564800000715
Thereby reducing the weight variable
Figure BDA00023936564800000716
To
Figure BDA00023936564800000717
And simultaneously reduces the performance loss of the artificial neural network 1. Weight variable
Figure BDA00023936564800000718
May continue to be adjusted until the slope of the corresponding weight variable
Figure BDA00023936564800000719
Up to 0 to find the local minimum of the cross entropy CE. According to equation (2) and the derivative chain law, the slope of the weight variable
Figure BDA00023936564800000720
Can be represented by equation (6):
Figure BDA00023936564800000721
according to the formula (6), when the connectivity mask m is 0, the slope of the weight variable
Figure BDA00023936564800000722
Is 0, resulting in a weight variable
Figure BDA00023936564800000723
Cannot be updated and the training procedure is terminated. To make the weight variable
Figure BDA00023936564800000724
Maintaining a trainable form, weighting the slope of the variable during reverse propagation
Figure BDA00023936564800000725
Can be redefined as the weight slope of the objective function L to the weight w
Figure BDA00023936564800000726
And can be represented by equation (7):
Figure BDA00023936564800000727
by varying the slope of the weight variable
Figure BDA00023936564800000728
Redefined as weight slope
Figure BDA00023936564800000729
Weight variable even when the connectivity mask m is 0
Figure BDA00023936564800000730
Can also maintainCan be trained. Referring to FIG. 2, the weight w and the weight variable
Figure BDA00023936564800000731
The dashed lines in between indicate that the element-to-element multiplication is skipped when propagating in reverse. Slope of weight
Figure BDA00023936564800000732
Can be obtained by reverse propagation. Whether the connectivity mask m is 1 or 0, the weight variable
Figure BDA00023936564800000733
Can all depend on the slope of the weight
Figure BDA00023936564800000734
And (6) updating. In this way, even some of the weight variables
Figure BDA00023936564800000735
To
Figure BDA00023936564800000736
Temporarily masked by 0, and may train weight variables
Figure BDA00023936564800000737
To
Figure BDA00023936564800000738
The artificial neural network 1 divides the weight w into connectivity variables
Figure BDA00023936564800000739
And weight variable
Figure BDA00023936564800000740
Training connectivity variables
Figure BDA00023936564800000741
To form a sparse connection structure, and training weight variables
Figure BDA00023936564800000742
To produce a simple model of the artificial neural network 1. Furthermore, to train the connectivities variables
Figure BDA0002393656480000081
And weight variable
Figure BDA0002393656480000082
Slope of connectivity variable
Figure BDA0002393656480000083
Redefined as connectivity mask slope
Figure BDA0002393656480000084
And the slope of the weight variable
Figure BDA0002393656480000085
Is redefined as a weight slope
Figure BDA0002393656480000086
The resulting sparse connection structure of the artificial neural network 1 can significantly reduce computational complexity, memory requirements and power consumption.
Fig. 3 is a flow chart of a training method 300 of the artificial neural network 1. The method 300 includes steps S302 to S306, training the artificial neural network 1 to form a sparse connection structure. Step S302 is applied to the convolutional layer of the artificial neural network 1 to generate an output estimation value, and steps S304 and S306 are applied to train a connectivity variable
Figure BDA0002393656480000087
To
Figure BDA0002393656480000088
And weight variable
Figure BDA0002393656480000089
To
Figure BDA00023936564800000810
Any reasonable technical change or step adjustment is the subject of the present inventionThe scope of the invention is disclosed. Steps S302 to S306 are explained below:
step S302: the convolutional layer calculates an output estimation value according to a weight w, the weight w is a weight variable
Figure BDA00023936564800000811
And a connectivity mask m defined by connectivity variables
Figure BDA00023936564800000812
Exporting;
step S304: adjusting connectivity variables according to an objective function L
Figure BDA00023936564800000813
To
Figure BDA00023936564800000814
To reduce the total number of connections and reduce the performance loss;
step S306: adjusting the weight variable according to the objective function L
Figure BDA00023936564800000815
To
Figure BDA00023936564800000816
To reduce the weight variable
Figure BDA00023936564800000817
To
Figure BDA00023936564800000818
The sum of (a) and (b).
The explanations of steps S302 to S306 have been provided in the previous paragraphs, and are not described herein again. The training method 300 trains the connectivity variables separately
Figure BDA00023936564800000819
To
Figure BDA00023936564800000820
And weight variable
Figure BDA00023936564800000821
To
Figure BDA00023936564800000822
To produce an artificial neural network 1 with sparse connections, simple construction and correct output prediction.
Fig. 4 shows an embodiment of a computing network 4 for constructing the artificial neural network 1. The computing network 4 includes a processor 402, a programming memory 404, a parameter memory 406, and an output interface 408. The programming memory 404 and the parameter memory 406 may be non-volatile memories. The processor 402 may be coupled to the programming memory 404, the parameter memory 406, and the output interface 408 to control the operations thereof. Weight of
Figure BDA00023936564800000823
To
Figure BDA00023936564800000824
Weight variable
Figure BDA00023936564800000825
To
Figure BDA00023936564800000826
Connective mask
Figure BDA00023936564800000827
To
Figure BDA00023936564800000828
Variable of connectivity
Figure BDA00023936564800000829
To
Figure BDA00023936564800000830
And associated slope may be stored in the parameter memory 406 while variables are varied with respect to the training connection
Figure BDA00023936564800000831
To
Figure BDA00023936564800000832
And weight variable
Figure BDA00023936564800000833
To
Figure BDA00023936564800000834
May be loaded into the processor 402 from the programming memory 404 during the training process. The instructions may include code for causing the convolutional layer to calculate an output estimate based on a weight w, the weight w being a weight variable
Figure BDA00023936564800000835
And a connectivity mask m definition, adjusting the connectivity variables according to the objective function L
Figure BDA00023936564800000836
To
Figure BDA00023936564800000837
And adjusting the weight variable according to the objective function L
Figure BDA0002393656480000091
To
Figure BDA0002393656480000092
The code of (1). Adjusted connectivity variables
Figure BDA0002393656480000093
To
Figure BDA0002393656480000094
And weight variable
Figure BDA0002393656480000095
To
Figure BDA0002393656480000096
The parameter memory 406 may be updated to replace old data. The output interface 408 may display output estimates in response to an input data set
Figure BDA0002393656480000097
To
Figure BDA0002393656480000098
Artificial neural network 1 and training method 300 for training connectivity variables
Figure BDA0002393656480000099
To
Figure BDA00023936564800000910
And weight variable
Figure BDA00023936564800000911
To
Figure BDA00023936564800000912
The sparse connection network is generated while outputting the correct output value.
The above-mentioned embodiments are only preferred embodiments of the present invention, and all equivalent changes and modifications made within the scope of the claims of the present invention should be covered by the present invention.

Claims (10)

1. A method for training a sparse-link neural network, the method for training a computational network, the computational network comprising a plurality of convolutional layers, the method comprising:
calculating an output estimate for one of the plurality of convolutional layers based on a weight defined by a weight variable and a connectivity mask representing a connection between the one of the plurality of convolutional layers and a previous convolutional layer, and the connectivity mask being derived from a connectivity variable; and
adjusting a plurality of connectivity variables according to an objective function to reduce a total number of connections between the plurality of convolutional layers and to reduce a performance loss representing a difference between the output estimate and a target value.
2. The method of claim 1, wherein adjusting the plurality of connectivity variables according to the objective function comprises:
calculating a connectivity mask slope of the objective function to the connectivity variable; and
updating the connectivity variable according to the connectivity mask slope.
3. The method of claim 1, further comprising:
the convolutional layer binarizes the connectivity variable according to a unit step function to generate the connectivity mask.
4. The method of claim 1, wherein the objective function comprises a first term corresponding to the performance loss and a second term corresponding to regularization of connectivity masks associated with the connections between the convolutional layers.
5. The method of claim 4, wherein the second term comprises a product of a connection attenuation coefficient and a sum of the plurality of connectivity masks associated with the plurality of connections between the plurality of convolutional layers.
6. The method of claim 4, wherein the objective function further comprises a third term corresponding to regularization of weight variables associated with the connections between the convolutional layers.
7. The method of claim 6, wherein the third term comprises a product of a weight attenuation coefficient and a sum of the weight variables associated with the connections between the convolutional layers.
8. The method of claim 1, wherein the performance loss is a cross-entropy cross entry.
9. The method of claim 1, further comprising:
adjusting a plurality of weight variables associated with the plurality of connections between the plurality of convolutional layers according to the objective function to reduce a sum of the plurality of weight variables.
10. The method of claim 9, wherein adjusting the plurality of weight variables according to the objective function comprises:
calculating a weight slope of the objective function to the weight; and
updating the weight variable according to the weight slope.
CN202010123340.9A 2019-05-23 2020-02-27 Method for training sparse connection neural network Pending CN111985603A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962851652P 2019-05-23 2019-05-23
US62/851,652 2019-05-23
US16/746,941 2020-01-19
US16/746,941 US20200372363A1 (en) 2019-05-23 2020-01-19 Method of Training Artificial Neural Network Using Sparse Connectivity Learning

Publications (1)

Publication Number Publication Date
CN111985603A true CN111985603A (en) 2020-11-24

Family

ID=73441727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010123340.9A Pending CN111985603A (en) 2019-05-23 2020-02-27 Method for training sparse connection neural network

Country Status (1)

Country Link
CN (1) CN111985603A (en)

Similar Documents

Publication Publication Date Title
US20210089922A1 (en) Joint pruning and quantization scheme for deep neural networks
CN108345939B (en) Neural network based on fixed-point operation
US10929744B2 (en) Fixed-point training method for deep neural networks based on dynamic fixed-point conversion scheme
CN106796668B (en) Method and system for bit-depth reduction in artificial neural network
CN109949255B (en) Image reconstruction method and device
US11308392B2 (en) Fixed-point training method for deep neural networks based on static fixed-point conversion scheme
Cai et al. An optimal construction and training of second order RBF network for approximation and illumination invariant image segmentation
CN109784420B (en) Image processing method and device, computer equipment and storage medium
US11449734B2 (en) Neural network reduction device, neural network reduction method, and storage medium
US20220300823A1 (en) Methods and systems for cross-domain few-shot classification
CN111008690A (en) Method and device for learning neural network with adaptive learning rate
CN111937011A (en) Method and equipment for determining weight parameters of neural network model
CN107292322B (en) Image classification method, deep learning model and computer system
CN111630530B (en) Data processing system, data processing method, and computer readable storage medium
TWI732467B (en) Method of training sparse connected neural network
CN111985603A (en) Method for training sparse connection neural network
CN112232477A (en) Image data processing method, apparatus, device and medium
Burney et al. A comparison of first and second order training algorithms for artificial neural networks
Duggal et al. High performance squeezenext for cifar-10
WO2019208248A1 (en) Learning device, learning method, and learning program
Poikonen et al. Online linear subspace learning in an analog array computing architecture
CN114580625A (en) Method, apparatus, and computer-readable storage medium for training neural network
EP4060558B1 (en) Deep learning based image segmentation method including biodegradable stent in intravascular optical tomography image
JP6942204B2 (en) Data processing system and data processing method
US20240221171A1 (en) Deep learning based image segmentation method including biodegradable stent in intravascular optical tomography image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination