WO2020195940A1 - Dispositif de réduction de modèle de réseau neuronal - Google Patents

Dispositif de réduction de modèle de réseau neuronal Download PDF

Info

Publication number
WO2020195940A1
WO2020195940A1 PCT/JP2020/011067 JP2020011067W WO2020195940A1 WO 2020195940 A1 WO2020195940 A1 WO 2020195940A1 JP 2020011067 W JP2020011067 W JP 2020011067W WO 2020195940 A1 WO2020195940 A1 WO 2020195940A1
Authority
WO
WIPO (PCT)
Prior art keywords
contraction
model
weight matrix
parameter
processing unit
Prior art date
Application number
PCT/JP2020/011067
Other languages
English (en)
Japanese (ja)
Inventor
晶子 正木
豪一 小野
光祥 猪貝
Original Assignee
株式会社日立ソリューションズ・テクノロジー
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立ソリューションズ・テクノロジー filed Critical 株式会社日立ソリューションズ・テクノロジー
Publication of WO2020195940A1 publication Critical patent/WO2020195940A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a model contraction device for a neural network.
  • Patent Document 1 it is possible to ensure that the performance of a neural network satisfies a predetermined requirement, reduce the amount of calculation of matrix multiplication by reducing the dimension of the matrix, and improve the processing speed of the neural network.
  • the devices of the network are disclosed.
  • a method for reducing the amount of calculation has also been developed by omitting the calculation related to the weight having low sensitivity to the inference result of the neural network.
  • This method is used to express the properties of the neural network model, such as the sparseness of the neural network structure, the cardinality, or the matrix characteristics (matrix characteristics such as singular values, principal components, and eigenvalues). It includes methods that use mutually independent indicators as reduction criteria. These methods are, for example, methods called neuron cutting, synaptic cutting, low-rank approximation, and the like. These methods are essentially the same approaches as the methods for reducing the amount of calculation described above.
  • Non-Patent Document 1 has been done.
  • an object of the present invention is to provide a neural network model contraction device or the like capable of suppressing an increase in the design period related to model contraction while combining a plurality of contraction methods.
  • the model contraction device for the neural network uses the first contraction parameter to perform the first contraction process for changing the elements of the weight matrix, and updates the weight matrix.
  • the second contraction process that reduces the size of the updated weight matrix using the contraction processing unit and the second contraction parameter is performed, and the network shape is deformed according to the reduced weight matrix. It is equipped with a processing unit.
  • FIG. 1 is a diagram illustrating an overall picture of a general neural network.
  • a convolutional neural network for a still image is given.
  • the still image of the input data is classified into each class defined by the user by the convolutional neural network.
  • the neural network of FIG. 1 has an input layer L10, hidden layers L20 to L90, and an output layer L100. These layers are realized on an arithmetic unit such as a processor by executing software, for example.
  • the neural network may include a plurality of each layer illustrated here.
  • the code of each layer is added for convenience, and does not indicate the number of layers of the neural network.
  • the arithmetic unit transforms the input image into image data suitable for the arithmetic of the neural network. Then, the arithmetic unit stores the deformed image data in the storage unit.
  • Each layer such as the input layer L10 has a weight matrix corresponding to each layer.
  • the input layer L10 executes a convolution operation using a weight matrix on the deformed image data.
  • the input layer L10 which will be described in detail, executes batch processing, processing related to the bias term, and the like in addition to the convolution operation.
  • the image data transitions to the state of the feature map.
  • the shallowest hidden layer L20 executes arithmetic processing on the feature map.
  • the output layer L100 calculates the probability distribution of each class to be classified for the input image by using, for example, an output function.
  • the output layer L100 outputs the classification result for the input image.
  • the arithmetic unit determines what kind of image the input image is.
  • FIG. 2 is a diagram illustrating a weight matrix of a neural network.
  • FIG. 2A is a diagram for explaining an operation using a weight matrix
  • FIG. 2B is a diagram illustrating a weight matrix.
  • Each neuron included in N1 and N3 in FIG. 2A is an arithmetic unit that returns a predetermined value for an input. Weights are assigned to each synapse included in S2. The product of the return value from each neuron of N1 in the previous layer and the weight of each corresponding synapse is input to each neuron of N3 in the next layer and summed.
  • Such arithmetic processing can be represented by a matrix operation, and the weights of each synapse summarized in a matrix format are represented as the weight matrix W in FIG. 2B.
  • the weight matrix has different elements that indicate the size and weight of the matrix depending on each layer.
  • the symbol “W” is used for the weight matrix, but in particular, for the weight matrix that has been analytically updated, the symbol “W” with a tilde “ ⁇ ” accent is used. It will be used.
  • the convolutional neural network of still images is given as an example, but the input data is not limited to still images.
  • the neural network can perform the above-mentioned image recognition, voice recognition, natural language processing, temperature, humidity, and recognition of the surrounding environment by recognizing the flow rate of the fluid.
  • the type of neural network is not limited to convolutional neural networks, and can be applied as long as it is an operation that can be defined in a matrix format.
  • the output value in the output layer L100 is not limited to the classification, but can be changed according to the user's purpose such as the object detection result and the voice recognition result.
  • a model of a neural network (hereinafter, may be simply referred to as a "model”) includes a network shape and a weight matrix in each layer of the neural network. As will be described in detail later, the weight matrix is optimized by learning so as to satisfy a predetermined recognition accuracy set by the user. ⁇ Configuration of model contraction device>
  • the model contraction device 1 is a functional block that performs model contraction of a neural network.
  • the model contraction process described below is performed on the input layer L10 to the deepest hidden layer L90 in FIG.
  • FIG. 3 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the first embodiment of the present invention.
  • the model contraction device 1 includes a learning / evaluation control unit (control unit) 10, a first contraction parameter reception unit 20, a weight calculation processing unit (first contraction processing unit) 30, and a second. It includes a contraction parameter reception unit 40, a network transformation / resynthesis processing unit (second contraction processing unit) 50, a weight matrix storage unit 60, a learning processing unit 70, an inference accuracy evaluation unit 80, and the like.
  • Each functional block constituting the model contraction device 1 other than the weight matrix storage unit 60 is realized, for example, by the processor executing software. Further, each functional block may be realized by hardware, or may be realized by cooperation between hardware and software.
  • a model storage unit 100, a learning data storage unit 71, an inference data storage unit 81, an input processing unit 90, and the like are connected to the model contraction device 1.
  • the model storage unit 100, the learning data storage unit 71, and the inference data storage unit 81 may be provided independently of each other, or may be integrally configured with these.
  • the weight matrix storage unit 60 in the model contraction device 1 may also be independent of these storage units, or may be integrally configured with these storage units.
  • the weight matrix storage unit 60 may be provided outside the model contraction device 1.
  • the input processing unit 90 is a functional block that specifies the contraction method executed in the model contraction device 1.
  • the input processing unit 90 is connected to, for example, an input interface, and notifies the learning / evaluation control unit 10 of a plurality of (for example, two types) reduction methods selected by the user via the input interface. Further, when the reduction parameter corresponding to the reduction method selected by the user is input, the input processing unit 90 outputs the input reduction parameter to the learning / evaluation control unit 10.
  • the contraction parameter is a parameter that determines the contraction rate of the model.
  • the contraction parameters are, for example, a threshold value of the contraction rate, a step size of the contraction rate, a calculation amount reduction rate, and the like, and those suitable for the contraction method selected by the user are appropriately set.
  • the model storage unit 100 is a storage medium for storing the model of the neural network. As shown in FIG. 3, the model storage unit 100 includes a weight matrix storage unit 101 that stores a weight matrix and a network shape storage unit 102 that stores a network shape. Although the weight matrix and network shape are appropriately updated by the model contraction process, the model storage unit 100 may store only the updated weight matrix and network shape, or stores the weight matrix and network shape before and after the update. Each may be stored.
  • the learning / evaluation control unit 10 is a functional block that controls processing related to learning / evaluation of a weight matrix, processing related to model contraction, and the like. As a process related to model contraction, the learning / evaluation control unit 10 assigns two types of contraction methods notified from the input processing unit 90. For example, the learning / evaluation control unit 10 assigns the contraction executed by the weight calculation processing unit 30 to the first contraction and the contraction executed by the network transformation / resynthesis processing unit 50 to the second contraction. .. Then, the learning / evaluation control unit 10 sets the contraction parameters (first contraction parameter, second contraction parameter) corresponding to the first contraction and the second contraction, respectively.
  • the learning / evaluation control unit 10 sets the first contraction notification for notifying the contraction method set as the first contraction, and the first contraction parameter reception unit 20 for the first contraction parameter corresponding to the first contraction. Output to.
  • the learning / evaluation control unit 10 receives the second contraction notification for notifying the contraction method set as the second contraction and the second contraction parameter corresponding to the second contraction. Output to unit 40.
  • the learning / evaluation control unit 10 continues the model reduction processing based on the evaluation result of the weight matrix after the model reduction by the learning processing unit 70 and the inference accuracy evaluation unit 80 as the processing related to the learning / evaluation of the weight matrix. Judge whether or not to do so.
  • the learning / evaluation control unit 10 resets the contraction parameters and continues the model contraction process.
  • the first contraction parameter receiving unit 20 outputs the first contraction notification and the first contraction parameter input from the learning / evaluation control unit 10 to the weight calculation processing unit 30.
  • the weight calculation processing unit 30 performs the first contraction processing for the weight matrix stored in the weight matrix storage unit 101 based on the first contraction notification and the first contraction parameter input from the first contraction parameter reception unit 20. It is a functional block that performs. In the first contraction process, the weight calculation processing unit 30 updates the weight matrix by updating each element without changing the size of the weight matrix. The update of the weight matrix in the weight calculation processing unit 30 is not performed by optimization such as learning, but is performed analytically. Moreover, since the size of the matrix does not change, the shape of the network does not change in the first contraction. The weight calculation processing unit 30 stores the updated weight matrix in the weight matrix storage unit 60.
  • the second contraction parameter reception unit 40 outputs the second contraction notification input from the learning / evaluation control unit 10 and the second contraction parameter to the network transformation / resynthesis processing unit 50.
  • the network transformation / resynthesis processing unit 50 is a functional block that controls processing related to network transformation and resynthesis based on the second contraction notification and the second contraction parameter input from the second contraction parameter reception unit 40. is there.
  • the network transformation / resynthesis processing unit 50 performs the second reduction with respect to the weight matrix updated by the weight calculation processing unit 30, reduces the size of the weight matrix, and transforms the weight matrix. Then, the network deformation / resynthesis processing unit 50 resynthesizes the network based on the deformed weight matrix and updates the network shape.
  • the network transformation / resynthesis processing unit 50 stores the updated network shape in the network shape storage unit 102.
  • the learning data storage unit 71 stores learning data and the like for performing learning processing on the weight matrix.
  • the learning processing unit 70 is a functional block that performs learning processing on the weight matrix stored in the weight matrix storage unit 101 by using the learning data of the learning data storage unit 71.
  • the learning processing unit 70 executes analysis processing using a weight matrix based on the learning data. Then, the learning processing unit 70 compares the analysis result with the learning data, optimizes the weight of each element in the weight matrix, and updates the weight matrix.
  • the inference accuracy evaluation by the inference accuracy evaluation unit 80 is performed on the weight matrix stored in the weight matrix storage unit 101, as will be described later.
  • the inference accuracy evaluation result is input to the learning / evaluation control unit 10, and during the learning process, the learning processing unit 70 receives control based on the inference accuracy evaluation result from the learning / evaluation control unit 10.
  • the inference data storage unit 81 stores inference accuracy evaluation data used for inference accuracy evaluation for the weight matrix.
  • the inference accuracy evaluation unit 80 evaluates the inference accuracy of the weight matrix stored in the weight matrix storage unit 101 using the inference accuracy evaluation data as test data.
  • the inference accuracy evaluation unit 80 outputs the inference accuracy evaluation result to the learning / evaluation control unit 10. ⁇ Model contraction method>
  • FIG. 4 is a flow chart showing an outline of the model contraction method according to the first embodiment of the present invention.
  • the flow of FIG. 4 includes steps S10 to S60.
  • the first contraction parameter used for the first contraction process is set (step S10).
  • the learning / evaluation control unit 10 reads the initial value of the contraction parameter for the first contraction from the non-volatile memory (not shown), and sets the read initial value as the first contraction parameter. Then, the learning / evaluation control unit 10 outputs the set first contraction parameter to the first contraction parameter reception unit 20. In addition, the learning / evaluation control unit 10 may set a value set by the user as the first contraction parameter.
  • the weight calculation processing unit 30 performs the first contraction processing using the first contraction parameter set in step S10, and changes each element of the weight matrix (step S20).
  • the weight calculation processing unit 30 generates a weight matrix composed of each changed element and updates the weight matrix.
  • the weight calculation processing unit 30 stores the updated weight matrix in the weight matrix storage unit 60.
  • the second contraction parameter used for the second contraction process is set (step S30).
  • the learning / evaluation control unit 10 reads the initial value of the contraction parameter for the second contraction from the non-volatile memory (not shown), and sets the read initial value as the second contraction parameter. Then, the learning / evaluation control unit 10 outputs the set second contraction parameter to the second contraction parameter reception unit 40. In addition to this, the learning / evaluation control unit 10 may set a value set by the user as the second contraction parameter.
  • the network modification / resynthesis processing unit 50 performs the second contraction processing using the second contraction parameter set in step S30, and the updated weight matrix stored in the weight matrix storage unit 60 Reduce the size (step S40). Further, the network transformation / resynthesis processing unit 50 reads out the network shape stored in the network shape storage unit 102, deforms the read network shape according to the weight matrix, and rejoins the network shape. The network transformation / resynthesis processing unit 50 stores the reduced weight matrix in the weight matrix storage unit 101 of the model storage unit 100, and stores the deformed network shape in the network shape storage unit 102.
  • the learning processing unit 70 reads the learning data from the learning data storage unit 71, and performs learning processing on the reduced weight matrix (step S50). Specifically, the learning processing unit 70 uses the learning data as input data and executes arithmetic processing using the reduced weight matrix stored in the weight matrix storage unit 101. The learning processing unit 70 uses the data of the output layer L100 or the data output from the output layer L100 as the calculation result, compares the input data with the calculation result, and optimizes the weight matrix. Further, the learning processing unit 70 outputs the learning result for the weight matrix to the learning / evaluation control unit 10.
  • the inference accuracy evaluation unit 80 reads the inference accuracy evaluation data from the inference data storage unit 81, and performs inference accuracy evaluation processing on the reduced weight matrix. Specifically, the inference accuracy evaluation unit 80 uses the inference accuracy evaluation data as input data and executes arithmetic processing using the reduced weight matrix stored in the weight matrix storage unit 101. The inference accuracy evaluation unit 80 uses the data of the output layer L100 or the data output from the output layer L100 as the calculation result, compares the data for inference accuracy evaluation with the calculation result, performs inference accuracy evaluation, and infer accuracy evaluation result. Is output to the learning / evaluation control unit 10. The inference accuracy evaluation by the inference accuracy evaluation unit 80 is performed a plurality of times at predetermined intervals.
  • the learning / evaluation control unit 10 uses the input learning result and the inference accuracy evaluation result to determine whether to continue or end the model contraction process (step S60). For example, when the learning / evaluation control unit 10 refers to the learning result and determines that the weight of each element in the weight matrix should be changed (1), it returns to step S10 and returns to the first contraction parameter and the second contraction. The parameters are reset, and the model contraction process is continued using the reset contraction parameters.
  • the learning / evaluation control unit 10 does not need to change the weight of each element in the weight matrix, but refers to the inference accuracy evaluation result, and the inference accuracy is lower than a predetermined threshold value or the inference accuracy drops sharply. If it is determined that this has been done, the process returns to step S30, only the second contraction parameter is reset, and the model contraction process is continued using the reset contraction parameter. On the other hand, the learning / evaluation control unit 10 does not need to change the weight of each element in the weight matrix in each layer, and when the inference accuracy is higher than a predetermined threshold value (3), it is necessary to reset each contraction parameter. It is determined that there is no model reduction process, and the model reduction process is terminated (END).
  • the learning / evaluation control unit 10 determines, for example, whether the inference accuracy is continuously lowered a plurality of times (for example, three times or more), or whether the reduction rate has reached the target value. As a determination item, it may be determined whether to continue or end the model contraction process. Specifically, when it is determined that the inference accuracy has decreased three or more times in a row and the model reduction rate has not reached the target (1), the learning / evaluation control unit 10 determines. Returning to step S10, the first contraction parameter and the second contraction parameter are reset, and the model contraction process is continued using the reset contraction parameter.
  • the learning / evaluation control unit 10 determines that the inference accuracy has decreased three times or more in succession or the model reduction rate has not reached the target (2)
  • the learning / evaluation control The unit 10 returns to step S30 and only resets the second contraction parameter, and continues the model contraction process using the reset contraction parameter.
  • the learning / evaluation control unit 10 determines that the inference accuracy has not decreased three times or more in a row and the model contraction rate has reached the target (3), each contraction It is determined that the parameter resetting is not necessary, and the model contraction process is terminated (END).
  • judgment item as to whether or not to continue the model contraction processing can be arbitrarily set by the user.
  • the learning / evaluation control unit 10 may set a value input by the user for each contraction parameter, or reset the model contraction rate.
  • the first contraction parameter and the second contraction parameter may be automatically reset.
  • the learning / evaluation control unit 10 may determine whether to continue or end the model contraction process without using the inference accuracy evaluation result. ⁇ Specific example of model contraction method (1) >>
  • FIG. 5 is a flow chart showing an example of the model contraction method. Since FIG. 5 corresponds to FIG. 4, the reference numerals corresponding to each step of FIG. 5 are matched with those of FIG. In this example, low-rank approximation is assigned as the first contraction process, and neuron mowing is assigned as the second contraction process.
  • step S10 the matrix rank threshold is set as the first contraction parameter. Then, the weight calculation processing unit 30 performs low-rank approximation using the matrix rank threshold value, and derives a low-rank matrix of the weight matrix (step S20). ⁇ Derivation method of low-ranking matrix >>>
  • the weight calculation processing unit 30 performs singular value decomposition of the weight matrix using the following equation (1).
  • U is the left singular vector
  • t V is the transpose of the right singular vector
  • S is the singular value diagonal matrix.
  • the diagonal component of the singular value diagonal matrix S is composed of L singular values corresponding to the rank number L of the weight matrix.
  • the weight calculation processing unit 30 replaces a component smaller than a predetermined threshold value D with “0” among the singular values corresponding to the components having a low contribution as an amount of information, and resynthesizes the weight matrix using the replaced values. .. In this way, the weight calculation processing unit 30 generates a low-ranked weight matrix.
  • the threshold value D may be appropriately set by the user based on the ratio to the number of ranks L, the absolute value with respect to the magnitude of the singular value, the threshold value with respect to the Frobenius norm, and the like.
  • Each component of the weight matrix lowered in this way is represented by the following equations (2) and (3).
  • Methods other than the above may be used, and methods such as principal component analysis, eigenvalue decomposition, and QR decomposition can be used. The user can appropriately select these methods.
  • step S30 the reduction ratio of the matrix size is set as the second reduction parameter.
  • the network transformation / resynthesis processing unit 50 performs neuron cutting based on the reduction ratio of the matrix size on the low-ranked matrix derived in step S30, and reduces the size of the low-ranked matrix (step S40).
  • a unit of the reduction ratio of the matrix size for example, the number of neurons, the calculation amount reduction rate, and the like are used. The user can appropriately select the unit of the reduction ratio of the matrix size from these.
  • a method of deleting neurons for example, a method called "quantization pruning" described below is used.
  • a method using a norm of weights which is an element of a weight matrix, may be adopted.
  • the L1 norm or L2 norm of the weight entering each neuron is used as the evaluation value, and the neurons having a lower evaluation value by the reduction ratio are deleted.
  • These evaluation values are calculated by, for example, the weight calculation processing unit 30.
  • step S20 since the weight matrix is lowered in step S20, neuron mowing using the evaluation value may affect the matrix rank.
  • the evaluation value may be any one using the matrix elements of the low rank weight matrix. It is also possible to perform synaptic mowing at the same time as neuron mowing. Further, since the purpose is to reduce the matrix size, the rows of the weight matrix may be deleted by neuron cutting, or the columns may be deleted.
  • the network transformation / resynthesis processing unit 50 deforms the network shape in response to the reduced weight matrix.
  • FIG. 6 is an explanatory diagram of model contraction by neuron mowing and synaptic mowing.
  • FIG. 6A is a diagram showing a specific example of reduction by neuron mowing and synaptic mowing
  • FIG. 6B illustrates a weight matrix whose size is reduced after neuron mowing and synaptic mowing. It is a figure to do.
  • the part indicated by the broken line in FIG. 6A shows the deleted neurons and synapses.
  • model contraction removes the central neuron in N1 and the second neuron from the left in N3.
  • synapses connected to deleted neurons have also been deleted.
  • FIG. 6 (b) By performing model contraction on the weight matrix, the size of the weight matrix is reduced as shown in FIG. 6 (b).
  • ⁇ Quantization pruning >>>
  • Quantization pruning is performed while leaving the weights discrete, as the activated neurons do not necessarily respond only to large weights. Quantization pruning can be applied not only to neuron pruning but also to synaptic pruning.
  • FIG. 7 is a diagram illustrating an execution procedure of quantization pruning.
  • the vertical axis of FIG. 7 is the evaluation value of each neuron.
  • As the evaluation value for example, the sum of the weights entering the neuron is used.
  • the horizontal axis of FIG. 7 is the neuron number. In the example of FIG. 7, in order to facilitate the explanation, the neuron number increases as the evaluation value increases. In the case of synapse cutting, the vertical axis may be used as the weight.
  • each neuron is classified into 6 clusters.
  • the neuron having the maximum evaluation value in each cluster is left, and the other neurons are deleted.
  • the neuron at the right end of each cluster is left as the representative neuron.
  • the evaluation values of the deleted neurons are hatched. After neuron deletion, the remaining representative neurons are reassigned with neuron numbers, and the distribution of evaluation values is updated as shown in FIG. 7 (b).
  • FIG. 8 is an explanatory diagram showing another example of the execution procedure of the quantized pruning.
  • the example of FIG. 8 is the same as that of FIG. 7 until each neuron is classified into a cluster. Then, after clustering, in each cluster, a neuron whose evaluation value is close to the center of gravity value, which is the average value of the weights, is left as a representative neuron, and other neurons are deleted. Then, the evaluation value of the representative neuron is overwritten with the center of gravity value. Then, after the neuron is deleted, the neuron numbers are reassigned to the remaining representative neurons, and the distribution of the evaluation values is updated as shown in FIG. 7 (b). ⁇ Specific example of model contraction method (2) >>
  • FIG. 9 is a flow chart showing an example of the model contraction method. Since FIG. 9 corresponds to FIG. 4, the reference numerals corresponding to each step of FIG. 9 are matched with those of FIG.
  • low-rank approximation was assigned as the first contraction process and neuron mowing was assigned as the second contraction process, but in this example, synapse mowing is assigned as the first contraction process and as the second contraction process.
  • Each low-rank approximation is assigned.
  • the contraction process can be executed even if the content of the contraction process is exchanged between the first contraction process and the second contraction process.
  • step S10 the connection cutting ratio is set as the first contraction parameter. Then, the weight calculation processing unit 30 performs synapse cutting using the connection cutting ratio and changes the elements of the weight matrix. As a result, the weight calculation processing unit 30 updates the weight matrix with the reduced amount of information (step S20). In step S20, the weight calculation processing unit 30 performs synapse cutting using, for example, a method such as quantization pruning using the center of gravity value described in FIG.
  • step S40 for example, the low rank ratio is set as the second contraction parameter.
  • step S50 the network transformation / resynthesis processing unit 50 performs low-rank approximation using, for example, a low-rank ratio, and reduces the matrix size of the weight matrix.
  • the processing described in the above-mentioned derivation of the low-ranking matrix is performed.
  • Sequentially feasible low-rank approximation using QR decomposition which is a non-singular value decomposition, is efficient for reducing the matrix size.
  • the network transformation / resynthesis processing unit 50 transforms the network shape in response to the weight matrix whose size has been reduced. ⁇ Main effects of this embodiment>
  • the weight matrix updated by the first contraction process is subjected to the second contraction process, and the weight matrix whose size is reduced is generated. According to this configuration, since the first contraction process and the second contraction process are continuously performed, it is possible to suppress an increase in the design period related to model contraction while combining a plurality of contraction methods. Become.
  • the learning process is performed using the learning model for the weight matrix whose size has been reduced. According to this configuration, it is not necessary to perform learning processing or the like for each contraction processing, so that an increase in the design period related to model contraction can be suppressed.
  • the inference accuracy evaluation using the inference accuracy evaluation data is performed on the weight matrix whose size has been reduced. According to this configuration, it is possible to evaluate the accuracy of the neural network model after the reduction process.
  • model contraction process it is determined whether or not the model contraction process is continuously performed by using the analysis result by the learning process. According to this configuration, model contraction can be repeated, and a contraction model suitable for the user's request can be generated.
  • the first contraction process for changing the elements of the weight matrix is performed using the first contraction parameter and the second contraction parameter.
  • FIG. 10 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the second embodiment of the present invention.
  • the model contraction device 201 of FIG. 10 includes a learning / evaluation control unit 10, a first contraction parameter reception unit 20, a weight calculation processing unit 230, a second contraction parameter reception unit 40, and a network deformation / resynthesis processing unit 50. It includes a weight matrix storage unit 60, a learning processing unit 70, an inference accuracy evaluation unit 80, and the like.
  • the second contraction parameter receiving unit 40 outputs the second contraction parameter input from the learning / evaluation control unit 10 to the weight calculation processing unit 230 and the network transformation / resynthesis processing unit 50.
  • the weight calculation processing unit 230 performs the first contraction processing on the weight matrix by using the first contraction parameter and the second contraction parameter.
  • FIG. 11 is a flow chart showing an outline of the model contraction method according to the second embodiment of the present invention.
  • the first contraction parameter is set in step S10
  • the second contraction parameter is set in step S30.
  • the weight calculation processing unit 230 performs the first contraction processing on the weight matrix by using the first contraction parameter and the second contraction parameter.
  • the weight calculation processing unit 230 stores the weight matrix updated by the first contraction processing in the weight matrix storage unit 60.
  • the weight matrix updated here depends on both the first contraction parameter and the second contraction parameter.
  • step S240 the network transformation / resynthesis processing unit 50 performs a second reduction process for reducing the matrix size of the weight matrix updated in step S220.
  • the network transformation / resynthesis processing unit 50 transforms the network shape in response to the weight matrix updated by the second contraction processing.
  • the deformed network shape is affected by the first contraction parameter and the second contraction parameter.
  • the network transformation / resynthesis processing unit 50 stores the updated weight matrix and the transformed network shape in each storage unit of the model storage unit 100.
  • FIG. 12 is a flow chart showing an example of the model contraction method according to the second embodiment. Since there are steps corresponding to FIG. 11 in FIG. 12, the steps corresponding to FIG. 11 are designated by the same reference numerals. In this example, low-rank approximation is assigned as the first contraction process, and neuron mowing is assigned as the second contraction process.
  • step S10 the matrix rank threshold that defines the matrix rank threshold is set as the first contraction parameter.
  • step S30 the difference threshold is set as the second reduction parameter.
  • the difference referred to here is the difference between the matrix rank of the original weight matrix before the lower rank and the matrix rank of the weight matrix after the lower rank. This difference is defined by the following equation (4).
  • Learning and evaluation control unit 10 sets the difference threshold ([delta]) relative to the difference (R ij) as the first contraction parameter.
  • step S220 the weight calculation processing unit 230 uses the matrix rank threshold value and the difference threshold value to perform low-rank approximation to the weight matrix.
  • the weight calculation processing unit 230 changes the weight of each element of the weight matrix according to the following equation (5), for example.
  • the weight calculation processing unit 230 updates the weight matrix. That is, the weight calculation processing unit 230 compares the difference threshold value with the difference, sets 0 as a component having a large influence of lowering the difference when the difference is equal to or larger than the difference threshold value, and sets the other components to 0, and for other components, the weight in the lowering matrix. Update the weight matrix with. Then, the weight calculation processing unit 230 derives the lower rank matrix by the same method as in the first embodiment.
  • the network transformation / resynthesis processing unit 50 calculates the ratio of neuron mowing to the low-ranking matrix derived in step S220.
  • the network transformation / resynthesis processing unit 50 may calculate the number of 0-filled components in the lowering matrix as the neuron cutting ratio.
  • step S240 the network transformation / resynthesis processing unit 50 performs neuron cutting according to the neuron cutting ratio calculated in step S230, and reduces the size of the weight matrix. Further, the network transformation / resynthesis processing unit 50 transforms the network shape with respect to the weight matrix whose size has been reduced.
  • the first contraction process using the first contraction parameter and the second contraction parameter is performed.
  • the weight matrix can be updated under the influence of the first contraction parameter and the second contraction parameter.
  • FIG. 13 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the third embodiment of the present invention.
  • the model contraction device 301 of FIG. 13 is similar to that of FIG. 10, but contracts between the learning / evaluation control unit 310 and the first contraction parameter reception unit 20 and the second contraction parameter reception unit 40. The difference is that the parameter calculation processing unit 315 is provided.
  • the learning / evaluation control unit 310 calculates the contraction specific gravity of the contraction amount by the first contraction process executed immediately before and the contraction amount by the second contraction process as a contraction parameter.
  • the contraction parameter calculation processing unit 315 calculates the first contraction parameter and the second contraction parameter using the contraction specific gravity calculated by the learning / evaluation control unit 310.
  • the contraction parameter calculation processing unit 315 outputs the calculated first contraction parameter and the second contraction parameter to the first contraction parameter reception unit 20 and the second contraction parameter reception unit 40, respectively.
  • the first contraction parameter and the second contraction parameter are automatically calculated using the contraction specific gravity, so that the user can use the first contraction parameter and the second contraction parameter except for the initial value. 2
  • the reduction parameters are not entered directly.
  • FIG. 14 is a flow chart showing an outline of the model contraction method according to the third embodiment of the present invention.
  • FIG. 14 is similar to FIG. 11 and has the same reference numerals to the corresponding steps.
  • the learning / evaluation control unit 310 calculates the contraction specific gravity and outputs the contraction specific gravity to the contraction parameter calculation processing unit 315.
  • step S320 the contraction parameter calculation processing unit 315 calculates the first contraction parameter and the second contraction parameter using the input contraction specific gravity.
  • the calculated first contraction parameter is output to the weight calculation processing unit 30 via the first contraction parameter reception unit 20.
  • the calculated second contraction parameter is output to the weight calculation processing unit 30 and the network transformation / resynthesis processing unit 50 via the second contraction parameter reception unit 40.
  • step S360 when the learning / evaluation control unit 310 determines that the model contraction process is to be continued (Yes), it returns to step S10 and calculates the contraction parameter (reduction specific gravity). On the other hand, when the learning / evaluation control unit 310 determines that the model contraction process is not continued (No), the model contraction process ends.
  • the parameter to be updated is only the contraction specific gravity, so that the number of loops can be reduced and the design period related to the model contraction can be increased. It becomes possible to suppress it.
  • the present invention is not limited to the above-described embodiment, and includes various modifications. It is also possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. .. Further, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration. It should be noted that each member and the relative size described in the drawings are simplified and idealized in order to explain the present invention in an easy-to-understand manner, and may have a more complicated shape in mounting.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un dispositif de réduction de modèle 1 d'un réseau neuronal qui est pourvu : d'une unité de traitement de calcul de poids (première unité de traitement de réduction) 30 qui effectue un premier processus de réduction qui modifie des éléments d'une matrice de poids à l'aide de premiers paramètres de réduction, et met à jour la matrice de poids; et d'une unité de traitement de déformation/recombinaison de réseau (seconde unité de traitement de réduction) 50 qui effectue un second processus de réduction qui réduit la taille de la matrice de poids mise à jour à l'aide de seconds paramètres de réduction, et déforme la forme du réseau en association avec la matrice de poids réduite.
PCT/JP2020/011067 2019-03-22 2020-03-13 Dispositif de réduction de modèle de réseau neuronal WO2020195940A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-055025 2019-03-22
JP2019055025A JP7150651B2 (ja) 2019-03-22 2019-03-22 ニューラルネットワークのモデル縮約装置

Publications (1)

Publication Number Publication Date
WO2020195940A1 true WO2020195940A1 (fr) 2020-10-01

Family

ID=72559403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/011067 WO2020195940A1 (fr) 2019-03-22 2020-03-13 Dispositif de réduction de modèle de réseau neuronal

Country Status (2)

Country Link
JP (1) JP7150651B2 (fr)
WO (1) WO2020195940A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7502972B2 (ja) 2020-11-17 2024-06-19 株式会社日立ソリューションズ・テクノロジー プルーニング管理装置、プルーニング管理システム及びプルーニング管理方法
JP2022136575A (ja) * 2021-03-08 2022-09-21 オムロン株式会社 推論装置、モデル生成装置、推論方法、及び推論プログラム
JP2023083997A (ja) 2021-12-06 2023-06-16 株式会社デンソー モデル生成方法、モデル生成プログラム、モデル生成装置、データ処理装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180232640A1 (en) * 2017-02-10 2018-08-16 Samsung Electronics Co., Ltd. Automatic thresholds for neural network pruning and retraining
JP2018129033A (ja) * 2016-12-21 2018-08-16 アクシス アーベー 人工ニューラルネットワークのクラスに基づく枝刈り

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018129033A (ja) * 2016-12-21 2018-08-16 アクシス アーベー 人工ニューラルネットワークのクラスに基づく枝刈り
US20180232640A1 (en) * 2017-02-10 2018-08-16 Samsung Electronics Co., Ltd. Automatic thresholds for neural network pruning and retraining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SASAKI, KENTA ET AL.: "Non-official translation: Adaptive compression of deep learning models by parameter reduction and reconstruction", DEIM FORUM 2018 FL-2, THE 10TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT (THE 16TH ANNUAL CONFERENCE OF THE DATABASE SOCIETY OF JAPAN), 6 March 2018 (2018-03-06) *
WATANABE, CHIHIRO ET AL.: "Non-official translation: Extracting rough structure based on combined weights of positive and negative in multilayered neural networks", FORUM ON INFORMATION TECHNOLOGY 2017(FIT2017, 5 September 2017 (2017-09-05) *

Also Published As

Publication number Publication date
JP7150651B2 (ja) 2022-10-11
JP2020155010A (ja) 2020-09-24

Similar Documents

Publication Publication Date Title
US11875268B2 (en) Object recognition with reduced neural network weight precision
US11568258B2 (en) Operation method
KR102410820B1 (ko) 뉴럴 네트워크를 이용한 인식 방법 및 장치 및 상기 뉴럴 네트워크를 트레이닝하는 방법 및 장치
US11521064B2 (en) Training a neural network model
KR101880901B1 (ko) 기계 학습 방법 및 장치
WO2020195940A1 (fr) Dispositif de réduction de modèle de réseau neuronal
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
US20190050734A1 (en) Compression method of deep neural networks
WO2019091020A1 (fr) Procédé de stockage de données de poids, et processeur de réseau neuronal basé sur le procédé
US9129222B2 (en) Method and apparatus for a local competitive learning rule that leads to sparse connectivity
US11392829B1 (en) Managing data sparsity for neural networks
CN107292352B (zh) 基于卷积神经网络的图像分类方法和装置
CN111105029B (zh) 神经网络的生成方法、生成装置和电子设备
WO2022105108A1 (fr) Procédé, appareil et dispositif de classification de données de réseau, et support de stockage lisible
JP6950756B2 (ja) ニューラルネットワークのランク最適化装置および最適化方法
US11657285B2 (en) Methods, systems, and media for random semi-structured row-wise pruning in neural networks
US11775832B2 (en) Device and method for artificial neural network operation
CN109447096B (zh) 一种基于机器学习的扫视路径预测方法和装置
JP2022165395A (ja) ニューラルネットワークモデルの最適化方法及びニューラルネットワークモデルに関するグラフィックユーザインターフェースを提供する方法
US20190311248A1 (en) Method for random sampled convolutions with low cost enhanced expressive power
CN116805157B (zh) 无人集群自主动态评估方法及装置
KR20220032861A (ko) 하드웨어에서의 성능을 고려한 뉴럴 아키텍처 서치 방법 빛 장치
WO2022127603A1 (fr) Procédé de traitement de modèle et dispositif associé
US20220121927A1 (en) Providing neural networks
CN117999560A (zh) 机器学习模型的硬件感知渐进训练

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20779296

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20779296

Country of ref document: EP

Kind code of ref document: A1