WO2020195940A1 - Model reduction device of neural network - Google Patents

Model reduction device of neural network Download PDF

Info

Publication number
WO2020195940A1
WO2020195940A1 PCT/JP2020/011067 JP2020011067W WO2020195940A1 WO 2020195940 A1 WO2020195940 A1 WO 2020195940A1 JP 2020011067 W JP2020011067 W JP 2020011067W WO 2020195940 A1 WO2020195940 A1 WO 2020195940A1
Authority
WO
WIPO (PCT)
Prior art keywords
contraction
model
weight matrix
parameter
processing unit
Prior art date
Application number
PCT/JP2020/011067
Other languages
French (fr)
Japanese (ja)
Inventor
晶子 正木
豪一 小野
光祥 猪貝
Original Assignee
株式会社日立ソリューションズ・テクノロジー
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立ソリューションズ・テクノロジー filed Critical 株式会社日立ソリューションズ・テクノロジー
Publication of WO2020195940A1 publication Critical patent/WO2020195940A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a model contraction device for a neural network.
  • Patent Document 1 it is possible to ensure that the performance of a neural network satisfies a predetermined requirement, reduce the amount of calculation of matrix multiplication by reducing the dimension of the matrix, and improve the processing speed of the neural network.
  • the devices of the network are disclosed.
  • a method for reducing the amount of calculation has also been developed by omitting the calculation related to the weight having low sensitivity to the inference result of the neural network.
  • This method is used to express the properties of the neural network model, such as the sparseness of the neural network structure, the cardinality, or the matrix characteristics (matrix characteristics such as singular values, principal components, and eigenvalues). It includes methods that use mutually independent indicators as reduction criteria. These methods are, for example, methods called neuron cutting, synaptic cutting, low-rank approximation, and the like. These methods are essentially the same approaches as the methods for reducing the amount of calculation described above.
  • Non-Patent Document 1 has been done.
  • an object of the present invention is to provide a neural network model contraction device or the like capable of suppressing an increase in the design period related to model contraction while combining a plurality of contraction methods.
  • the model contraction device for the neural network uses the first contraction parameter to perform the first contraction process for changing the elements of the weight matrix, and updates the weight matrix.
  • the second contraction process that reduces the size of the updated weight matrix using the contraction processing unit and the second contraction parameter is performed, and the network shape is deformed according to the reduced weight matrix. It is equipped with a processing unit.
  • FIG. 1 is a diagram illustrating an overall picture of a general neural network.
  • a convolutional neural network for a still image is given.
  • the still image of the input data is classified into each class defined by the user by the convolutional neural network.
  • the neural network of FIG. 1 has an input layer L10, hidden layers L20 to L90, and an output layer L100. These layers are realized on an arithmetic unit such as a processor by executing software, for example.
  • the neural network may include a plurality of each layer illustrated here.
  • the code of each layer is added for convenience, and does not indicate the number of layers of the neural network.
  • the arithmetic unit transforms the input image into image data suitable for the arithmetic of the neural network. Then, the arithmetic unit stores the deformed image data in the storage unit.
  • Each layer such as the input layer L10 has a weight matrix corresponding to each layer.
  • the input layer L10 executes a convolution operation using a weight matrix on the deformed image data.
  • the input layer L10 which will be described in detail, executes batch processing, processing related to the bias term, and the like in addition to the convolution operation.
  • the image data transitions to the state of the feature map.
  • the shallowest hidden layer L20 executes arithmetic processing on the feature map.
  • the output layer L100 calculates the probability distribution of each class to be classified for the input image by using, for example, an output function.
  • the output layer L100 outputs the classification result for the input image.
  • the arithmetic unit determines what kind of image the input image is.
  • FIG. 2 is a diagram illustrating a weight matrix of a neural network.
  • FIG. 2A is a diagram for explaining an operation using a weight matrix
  • FIG. 2B is a diagram illustrating a weight matrix.
  • Each neuron included in N1 and N3 in FIG. 2A is an arithmetic unit that returns a predetermined value for an input. Weights are assigned to each synapse included in S2. The product of the return value from each neuron of N1 in the previous layer and the weight of each corresponding synapse is input to each neuron of N3 in the next layer and summed.
  • Such arithmetic processing can be represented by a matrix operation, and the weights of each synapse summarized in a matrix format are represented as the weight matrix W in FIG. 2B.
  • the weight matrix has different elements that indicate the size and weight of the matrix depending on each layer.
  • the symbol “W” is used for the weight matrix, but in particular, for the weight matrix that has been analytically updated, the symbol “W” with a tilde “ ⁇ ” accent is used. It will be used.
  • the convolutional neural network of still images is given as an example, but the input data is not limited to still images.
  • the neural network can perform the above-mentioned image recognition, voice recognition, natural language processing, temperature, humidity, and recognition of the surrounding environment by recognizing the flow rate of the fluid.
  • the type of neural network is not limited to convolutional neural networks, and can be applied as long as it is an operation that can be defined in a matrix format.
  • the output value in the output layer L100 is not limited to the classification, but can be changed according to the user's purpose such as the object detection result and the voice recognition result.
  • a model of a neural network (hereinafter, may be simply referred to as a "model”) includes a network shape and a weight matrix in each layer of the neural network. As will be described in detail later, the weight matrix is optimized by learning so as to satisfy a predetermined recognition accuracy set by the user. ⁇ Configuration of model contraction device>
  • the model contraction device 1 is a functional block that performs model contraction of a neural network.
  • the model contraction process described below is performed on the input layer L10 to the deepest hidden layer L90 in FIG.
  • FIG. 3 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the first embodiment of the present invention.
  • the model contraction device 1 includes a learning / evaluation control unit (control unit) 10, a first contraction parameter reception unit 20, a weight calculation processing unit (first contraction processing unit) 30, and a second. It includes a contraction parameter reception unit 40, a network transformation / resynthesis processing unit (second contraction processing unit) 50, a weight matrix storage unit 60, a learning processing unit 70, an inference accuracy evaluation unit 80, and the like.
  • Each functional block constituting the model contraction device 1 other than the weight matrix storage unit 60 is realized, for example, by the processor executing software. Further, each functional block may be realized by hardware, or may be realized by cooperation between hardware and software.
  • a model storage unit 100, a learning data storage unit 71, an inference data storage unit 81, an input processing unit 90, and the like are connected to the model contraction device 1.
  • the model storage unit 100, the learning data storage unit 71, and the inference data storage unit 81 may be provided independently of each other, or may be integrally configured with these.
  • the weight matrix storage unit 60 in the model contraction device 1 may also be independent of these storage units, or may be integrally configured with these storage units.
  • the weight matrix storage unit 60 may be provided outside the model contraction device 1.
  • the input processing unit 90 is a functional block that specifies the contraction method executed in the model contraction device 1.
  • the input processing unit 90 is connected to, for example, an input interface, and notifies the learning / evaluation control unit 10 of a plurality of (for example, two types) reduction methods selected by the user via the input interface. Further, when the reduction parameter corresponding to the reduction method selected by the user is input, the input processing unit 90 outputs the input reduction parameter to the learning / evaluation control unit 10.
  • the contraction parameter is a parameter that determines the contraction rate of the model.
  • the contraction parameters are, for example, a threshold value of the contraction rate, a step size of the contraction rate, a calculation amount reduction rate, and the like, and those suitable for the contraction method selected by the user are appropriately set.
  • the model storage unit 100 is a storage medium for storing the model of the neural network. As shown in FIG. 3, the model storage unit 100 includes a weight matrix storage unit 101 that stores a weight matrix and a network shape storage unit 102 that stores a network shape. Although the weight matrix and network shape are appropriately updated by the model contraction process, the model storage unit 100 may store only the updated weight matrix and network shape, or stores the weight matrix and network shape before and after the update. Each may be stored.
  • the learning / evaluation control unit 10 is a functional block that controls processing related to learning / evaluation of a weight matrix, processing related to model contraction, and the like. As a process related to model contraction, the learning / evaluation control unit 10 assigns two types of contraction methods notified from the input processing unit 90. For example, the learning / evaluation control unit 10 assigns the contraction executed by the weight calculation processing unit 30 to the first contraction and the contraction executed by the network transformation / resynthesis processing unit 50 to the second contraction. .. Then, the learning / evaluation control unit 10 sets the contraction parameters (first contraction parameter, second contraction parameter) corresponding to the first contraction and the second contraction, respectively.
  • the learning / evaluation control unit 10 sets the first contraction notification for notifying the contraction method set as the first contraction, and the first contraction parameter reception unit 20 for the first contraction parameter corresponding to the first contraction. Output to.
  • the learning / evaluation control unit 10 receives the second contraction notification for notifying the contraction method set as the second contraction and the second contraction parameter corresponding to the second contraction. Output to unit 40.
  • the learning / evaluation control unit 10 continues the model reduction processing based on the evaluation result of the weight matrix after the model reduction by the learning processing unit 70 and the inference accuracy evaluation unit 80 as the processing related to the learning / evaluation of the weight matrix. Judge whether or not to do so.
  • the learning / evaluation control unit 10 resets the contraction parameters and continues the model contraction process.
  • the first contraction parameter receiving unit 20 outputs the first contraction notification and the first contraction parameter input from the learning / evaluation control unit 10 to the weight calculation processing unit 30.
  • the weight calculation processing unit 30 performs the first contraction processing for the weight matrix stored in the weight matrix storage unit 101 based on the first contraction notification and the first contraction parameter input from the first contraction parameter reception unit 20. It is a functional block that performs. In the first contraction process, the weight calculation processing unit 30 updates the weight matrix by updating each element without changing the size of the weight matrix. The update of the weight matrix in the weight calculation processing unit 30 is not performed by optimization such as learning, but is performed analytically. Moreover, since the size of the matrix does not change, the shape of the network does not change in the first contraction. The weight calculation processing unit 30 stores the updated weight matrix in the weight matrix storage unit 60.
  • the second contraction parameter reception unit 40 outputs the second contraction notification input from the learning / evaluation control unit 10 and the second contraction parameter to the network transformation / resynthesis processing unit 50.
  • the network transformation / resynthesis processing unit 50 is a functional block that controls processing related to network transformation and resynthesis based on the second contraction notification and the second contraction parameter input from the second contraction parameter reception unit 40. is there.
  • the network transformation / resynthesis processing unit 50 performs the second reduction with respect to the weight matrix updated by the weight calculation processing unit 30, reduces the size of the weight matrix, and transforms the weight matrix. Then, the network deformation / resynthesis processing unit 50 resynthesizes the network based on the deformed weight matrix and updates the network shape.
  • the network transformation / resynthesis processing unit 50 stores the updated network shape in the network shape storage unit 102.
  • the learning data storage unit 71 stores learning data and the like for performing learning processing on the weight matrix.
  • the learning processing unit 70 is a functional block that performs learning processing on the weight matrix stored in the weight matrix storage unit 101 by using the learning data of the learning data storage unit 71.
  • the learning processing unit 70 executes analysis processing using a weight matrix based on the learning data. Then, the learning processing unit 70 compares the analysis result with the learning data, optimizes the weight of each element in the weight matrix, and updates the weight matrix.
  • the inference accuracy evaluation by the inference accuracy evaluation unit 80 is performed on the weight matrix stored in the weight matrix storage unit 101, as will be described later.
  • the inference accuracy evaluation result is input to the learning / evaluation control unit 10, and during the learning process, the learning processing unit 70 receives control based on the inference accuracy evaluation result from the learning / evaluation control unit 10.
  • the inference data storage unit 81 stores inference accuracy evaluation data used for inference accuracy evaluation for the weight matrix.
  • the inference accuracy evaluation unit 80 evaluates the inference accuracy of the weight matrix stored in the weight matrix storage unit 101 using the inference accuracy evaluation data as test data.
  • the inference accuracy evaluation unit 80 outputs the inference accuracy evaluation result to the learning / evaluation control unit 10. ⁇ Model contraction method>
  • FIG. 4 is a flow chart showing an outline of the model contraction method according to the first embodiment of the present invention.
  • the flow of FIG. 4 includes steps S10 to S60.
  • the first contraction parameter used for the first contraction process is set (step S10).
  • the learning / evaluation control unit 10 reads the initial value of the contraction parameter for the first contraction from the non-volatile memory (not shown), and sets the read initial value as the first contraction parameter. Then, the learning / evaluation control unit 10 outputs the set first contraction parameter to the first contraction parameter reception unit 20. In addition, the learning / evaluation control unit 10 may set a value set by the user as the first contraction parameter.
  • the weight calculation processing unit 30 performs the first contraction processing using the first contraction parameter set in step S10, and changes each element of the weight matrix (step S20).
  • the weight calculation processing unit 30 generates a weight matrix composed of each changed element and updates the weight matrix.
  • the weight calculation processing unit 30 stores the updated weight matrix in the weight matrix storage unit 60.
  • the second contraction parameter used for the second contraction process is set (step S30).
  • the learning / evaluation control unit 10 reads the initial value of the contraction parameter for the second contraction from the non-volatile memory (not shown), and sets the read initial value as the second contraction parameter. Then, the learning / evaluation control unit 10 outputs the set second contraction parameter to the second contraction parameter reception unit 40. In addition to this, the learning / evaluation control unit 10 may set a value set by the user as the second contraction parameter.
  • the network modification / resynthesis processing unit 50 performs the second contraction processing using the second contraction parameter set in step S30, and the updated weight matrix stored in the weight matrix storage unit 60 Reduce the size (step S40). Further, the network transformation / resynthesis processing unit 50 reads out the network shape stored in the network shape storage unit 102, deforms the read network shape according to the weight matrix, and rejoins the network shape. The network transformation / resynthesis processing unit 50 stores the reduced weight matrix in the weight matrix storage unit 101 of the model storage unit 100, and stores the deformed network shape in the network shape storage unit 102.
  • the learning processing unit 70 reads the learning data from the learning data storage unit 71, and performs learning processing on the reduced weight matrix (step S50). Specifically, the learning processing unit 70 uses the learning data as input data and executes arithmetic processing using the reduced weight matrix stored in the weight matrix storage unit 101. The learning processing unit 70 uses the data of the output layer L100 or the data output from the output layer L100 as the calculation result, compares the input data with the calculation result, and optimizes the weight matrix. Further, the learning processing unit 70 outputs the learning result for the weight matrix to the learning / evaluation control unit 10.
  • the inference accuracy evaluation unit 80 reads the inference accuracy evaluation data from the inference data storage unit 81, and performs inference accuracy evaluation processing on the reduced weight matrix. Specifically, the inference accuracy evaluation unit 80 uses the inference accuracy evaluation data as input data and executes arithmetic processing using the reduced weight matrix stored in the weight matrix storage unit 101. The inference accuracy evaluation unit 80 uses the data of the output layer L100 or the data output from the output layer L100 as the calculation result, compares the data for inference accuracy evaluation with the calculation result, performs inference accuracy evaluation, and infer accuracy evaluation result. Is output to the learning / evaluation control unit 10. The inference accuracy evaluation by the inference accuracy evaluation unit 80 is performed a plurality of times at predetermined intervals.
  • the learning / evaluation control unit 10 uses the input learning result and the inference accuracy evaluation result to determine whether to continue or end the model contraction process (step S60). For example, when the learning / evaluation control unit 10 refers to the learning result and determines that the weight of each element in the weight matrix should be changed (1), it returns to step S10 and returns to the first contraction parameter and the second contraction. The parameters are reset, and the model contraction process is continued using the reset contraction parameters.
  • the learning / evaluation control unit 10 does not need to change the weight of each element in the weight matrix, but refers to the inference accuracy evaluation result, and the inference accuracy is lower than a predetermined threshold value or the inference accuracy drops sharply. If it is determined that this has been done, the process returns to step S30, only the second contraction parameter is reset, and the model contraction process is continued using the reset contraction parameter. On the other hand, the learning / evaluation control unit 10 does not need to change the weight of each element in the weight matrix in each layer, and when the inference accuracy is higher than a predetermined threshold value (3), it is necessary to reset each contraction parameter. It is determined that there is no model reduction process, and the model reduction process is terminated (END).
  • the learning / evaluation control unit 10 determines, for example, whether the inference accuracy is continuously lowered a plurality of times (for example, three times or more), or whether the reduction rate has reached the target value. As a determination item, it may be determined whether to continue or end the model contraction process. Specifically, when it is determined that the inference accuracy has decreased three or more times in a row and the model reduction rate has not reached the target (1), the learning / evaluation control unit 10 determines. Returning to step S10, the first contraction parameter and the second contraction parameter are reset, and the model contraction process is continued using the reset contraction parameter.
  • the learning / evaluation control unit 10 determines that the inference accuracy has decreased three times or more in succession or the model reduction rate has not reached the target (2)
  • the learning / evaluation control The unit 10 returns to step S30 and only resets the second contraction parameter, and continues the model contraction process using the reset contraction parameter.
  • the learning / evaluation control unit 10 determines that the inference accuracy has not decreased three times or more in a row and the model contraction rate has reached the target (3), each contraction It is determined that the parameter resetting is not necessary, and the model contraction process is terminated (END).
  • judgment item as to whether or not to continue the model contraction processing can be arbitrarily set by the user.
  • the learning / evaluation control unit 10 may set a value input by the user for each contraction parameter, or reset the model contraction rate.
  • the first contraction parameter and the second contraction parameter may be automatically reset.
  • the learning / evaluation control unit 10 may determine whether to continue or end the model contraction process without using the inference accuracy evaluation result. ⁇ Specific example of model contraction method (1) >>
  • FIG. 5 is a flow chart showing an example of the model contraction method. Since FIG. 5 corresponds to FIG. 4, the reference numerals corresponding to each step of FIG. 5 are matched with those of FIG. In this example, low-rank approximation is assigned as the first contraction process, and neuron mowing is assigned as the second contraction process.
  • step S10 the matrix rank threshold is set as the first contraction parameter. Then, the weight calculation processing unit 30 performs low-rank approximation using the matrix rank threshold value, and derives a low-rank matrix of the weight matrix (step S20). ⁇ Derivation method of low-ranking matrix >>>
  • the weight calculation processing unit 30 performs singular value decomposition of the weight matrix using the following equation (1).
  • U is the left singular vector
  • t V is the transpose of the right singular vector
  • S is the singular value diagonal matrix.
  • the diagonal component of the singular value diagonal matrix S is composed of L singular values corresponding to the rank number L of the weight matrix.
  • the weight calculation processing unit 30 replaces a component smaller than a predetermined threshold value D with “0” among the singular values corresponding to the components having a low contribution as an amount of information, and resynthesizes the weight matrix using the replaced values. .. In this way, the weight calculation processing unit 30 generates a low-ranked weight matrix.
  • the threshold value D may be appropriately set by the user based on the ratio to the number of ranks L, the absolute value with respect to the magnitude of the singular value, the threshold value with respect to the Frobenius norm, and the like.
  • Each component of the weight matrix lowered in this way is represented by the following equations (2) and (3).
  • Methods other than the above may be used, and methods such as principal component analysis, eigenvalue decomposition, and QR decomposition can be used. The user can appropriately select these methods.
  • step S30 the reduction ratio of the matrix size is set as the second reduction parameter.
  • the network transformation / resynthesis processing unit 50 performs neuron cutting based on the reduction ratio of the matrix size on the low-ranked matrix derived in step S30, and reduces the size of the low-ranked matrix (step S40).
  • a unit of the reduction ratio of the matrix size for example, the number of neurons, the calculation amount reduction rate, and the like are used. The user can appropriately select the unit of the reduction ratio of the matrix size from these.
  • a method of deleting neurons for example, a method called "quantization pruning" described below is used.
  • a method using a norm of weights which is an element of a weight matrix, may be adopted.
  • the L1 norm or L2 norm of the weight entering each neuron is used as the evaluation value, and the neurons having a lower evaluation value by the reduction ratio are deleted.
  • These evaluation values are calculated by, for example, the weight calculation processing unit 30.
  • step S20 since the weight matrix is lowered in step S20, neuron mowing using the evaluation value may affect the matrix rank.
  • the evaluation value may be any one using the matrix elements of the low rank weight matrix. It is also possible to perform synaptic mowing at the same time as neuron mowing. Further, since the purpose is to reduce the matrix size, the rows of the weight matrix may be deleted by neuron cutting, or the columns may be deleted.
  • the network transformation / resynthesis processing unit 50 deforms the network shape in response to the reduced weight matrix.
  • FIG. 6 is an explanatory diagram of model contraction by neuron mowing and synaptic mowing.
  • FIG. 6A is a diagram showing a specific example of reduction by neuron mowing and synaptic mowing
  • FIG. 6B illustrates a weight matrix whose size is reduced after neuron mowing and synaptic mowing. It is a figure to do.
  • the part indicated by the broken line in FIG. 6A shows the deleted neurons and synapses.
  • model contraction removes the central neuron in N1 and the second neuron from the left in N3.
  • synapses connected to deleted neurons have also been deleted.
  • FIG. 6 (b) By performing model contraction on the weight matrix, the size of the weight matrix is reduced as shown in FIG. 6 (b).
  • ⁇ Quantization pruning >>>
  • Quantization pruning is performed while leaving the weights discrete, as the activated neurons do not necessarily respond only to large weights. Quantization pruning can be applied not only to neuron pruning but also to synaptic pruning.
  • FIG. 7 is a diagram illustrating an execution procedure of quantization pruning.
  • the vertical axis of FIG. 7 is the evaluation value of each neuron.
  • As the evaluation value for example, the sum of the weights entering the neuron is used.
  • the horizontal axis of FIG. 7 is the neuron number. In the example of FIG. 7, in order to facilitate the explanation, the neuron number increases as the evaluation value increases. In the case of synapse cutting, the vertical axis may be used as the weight.
  • each neuron is classified into 6 clusters.
  • the neuron having the maximum evaluation value in each cluster is left, and the other neurons are deleted.
  • the neuron at the right end of each cluster is left as the representative neuron.
  • the evaluation values of the deleted neurons are hatched. After neuron deletion, the remaining representative neurons are reassigned with neuron numbers, and the distribution of evaluation values is updated as shown in FIG. 7 (b).
  • FIG. 8 is an explanatory diagram showing another example of the execution procedure of the quantized pruning.
  • the example of FIG. 8 is the same as that of FIG. 7 until each neuron is classified into a cluster. Then, after clustering, in each cluster, a neuron whose evaluation value is close to the center of gravity value, which is the average value of the weights, is left as a representative neuron, and other neurons are deleted. Then, the evaluation value of the representative neuron is overwritten with the center of gravity value. Then, after the neuron is deleted, the neuron numbers are reassigned to the remaining representative neurons, and the distribution of the evaluation values is updated as shown in FIG. 7 (b). ⁇ Specific example of model contraction method (2) >>
  • FIG. 9 is a flow chart showing an example of the model contraction method. Since FIG. 9 corresponds to FIG. 4, the reference numerals corresponding to each step of FIG. 9 are matched with those of FIG.
  • low-rank approximation was assigned as the first contraction process and neuron mowing was assigned as the second contraction process, but in this example, synapse mowing is assigned as the first contraction process and as the second contraction process.
  • Each low-rank approximation is assigned.
  • the contraction process can be executed even if the content of the contraction process is exchanged between the first contraction process and the second contraction process.
  • step S10 the connection cutting ratio is set as the first contraction parameter. Then, the weight calculation processing unit 30 performs synapse cutting using the connection cutting ratio and changes the elements of the weight matrix. As a result, the weight calculation processing unit 30 updates the weight matrix with the reduced amount of information (step S20). In step S20, the weight calculation processing unit 30 performs synapse cutting using, for example, a method such as quantization pruning using the center of gravity value described in FIG.
  • step S40 for example, the low rank ratio is set as the second contraction parameter.
  • step S50 the network transformation / resynthesis processing unit 50 performs low-rank approximation using, for example, a low-rank ratio, and reduces the matrix size of the weight matrix.
  • the processing described in the above-mentioned derivation of the low-ranking matrix is performed.
  • Sequentially feasible low-rank approximation using QR decomposition which is a non-singular value decomposition, is efficient for reducing the matrix size.
  • the network transformation / resynthesis processing unit 50 transforms the network shape in response to the weight matrix whose size has been reduced. ⁇ Main effects of this embodiment>
  • the weight matrix updated by the first contraction process is subjected to the second contraction process, and the weight matrix whose size is reduced is generated. According to this configuration, since the first contraction process and the second contraction process are continuously performed, it is possible to suppress an increase in the design period related to model contraction while combining a plurality of contraction methods. Become.
  • the learning process is performed using the learning model for the weight matrix whose size has been reduced. According to this configuration, it is not necessary to perform learning processing or the like for each contraction processing, so that an increase in the design period related to model contraction can be suppressed.
  • the inference accuracy evaluation using the inference accuracy evaluation data is performed on the weight matrix whose size has been reduced. According to this configuration, it is possible to evaluate the accuracy of the neural network model after the reduction process.
  • model contraction process it is determined whether or not the model contraction process is continuously performed by using the analysis result by the learning process. According to this configuration, model contraction can be repeated, and a contraction model suitable for the user's request can be generated.
  • the first contraction process for changing the elements of the weight matrix is performed using the first contraction parameter and the second contraction parameter.
  • FIG. 10 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the second embodiment of the present invention.
  • the model contraction device 201 of FIG. 10 includes a learning / evaluation control unit 10, a first contraction parameter reception unit 20, a weight calculation processing unit 230, a second contraction parameter reception unit 40, and a network deformation / resynthesis processing unit 50. It includes a weight matrix storage unit 60, a learning processing unit 70, an inference accuracy evaluation unit 80, and the like.
  • the second contraction parameter receiving unit 40 outputs the second contraction parameter input from the learning / evaluation control unit 10 to the weight calculation processing unit 230 and the network transformation / resynthesis processing unit 50.
  • the weight calculation processing unit 230 performs the first contraction processing on the weight matrix by using the first contraction parameter and the second contraction parameter.
  • FIG. 11 is a flow chart showing an outline of the model contraction method according to the second embodiment of the present invention.
  • the first contraction parameter is set in step S10
  • the second contraction parameter is set in step S30.
  • the weight calculation processing unit 230 performs the first contraction processing on the weight matrix by using the first contraction parameter and the second contraction parameter.
  • the weight calculation processing unit 230 stores the weight matrix updated by the first contraction processing in the weight matrix storage unit 60.
  • the weight matrix updated here depends on both the first contraction parameter and the second contraction parameter.
  • step S240 the network transformation / resynthesis processing unit 50 performs a second reduction process for reducing the matrix size of the weight matrix updated in step S220.
  • the network transformation / resynthesis processing unit 50 transforms the network shape in response to the weight matrix updated by the second contraction processing.
  • the deformed network shape is affected by the first contraction parameter and the second contraction parameter.
  • the network transformation / resynthesis processing unit 50 stores the updated weight matrix and the transformed network shape in each storage unit of the model storage unit 100.
  • FIG. 12 is a flow chart showing an example of the model contraction method according to the second embodiment. Since there are steps corresponding to FIG. 11 in FIG. 12, the steps corresponding to FIG. 11 are designated by the same reference numerals. In this example, low-rank approximation is assigned as the first contraction process, and neuron mowing is assigned as the second contraction process.
  • step S10 the matrix rank threshold that defines the matrix rank threshold is set as the first contraction parameter.
  • step S30 the difference threshold is set as the second reduction parameter.
  • the difference referred to here is the difference between the matrix rank of the original weight matrix before the lower rank and the matrix rank of the weight matrix after the lower rank. This difference is defined by the following equation (4).
  • Learning and evaluation control unit 10 sets the difference threshold ([delta]) relative to the difference (R ij) as the first contraction parameter.
  • step S220 the weight calculation processing unit 230 uses the matrix rank threshold value and the difference threshold value to perform low-rank approximation to the weight matrix.
  • the weight calculation processing unit 230 changes the weight of each element of the weight matrix according to the following equation (5), for example.
  • the weight calculation processing unit 230 updates the weight matrix. That is, the weight calculation processing unit 230 compares the difference threshold value with the difference, sets 0 as a component having a large influence of lowering the difference when the difference is equal to or larger than the difference threshold value, and sets the other components to 0, and for other components, the weight in the lowering matrix. Update the weight matrix with. Then, the weight calculation processing unit 230 derives the lower rank matrix by the same method as in the first embodiment.
  • the network transformation / resynthesis processing unit 50 calculates the ratio of neuron mowing to the low-ranking matrix derived in step S220.
  • the network transformation / resynthesis processing unit 50 may calculate the number of 0-filled components in the lowering matrix as the neuron cutting ratio.
  • step S240 the network transformation / resynthesis processing unit 50 performs neuron cutting according to the neuron cutting ratio calculated in step S230, and reduces the size of the weight matrix. Further, the network transformation / resynthesis processing unit 50 transforms the network shape with respect to the weight matrix whose size has been reduced.
  • the first contraction process using the first contraction parameter and the second contraction parameter is performed.
  • the weight matrix can be updated under the influence of the first contraction parameter and the second contraction parameter.
  • FIG. 13 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the third embodiment of the present invention.
  • the model contraction device 301 of FIG. 13 is similar to that of FIG. 10, but contracts between the learning / evaluation control unit 310 and the first contraction parameter reception unit 20 and the second contraction parameter reception unit 40. The difference is that the parameter calculation processing unit 315 is provided.
  • the learning / evaluation control unit 310 calculates the contraction specific gravity of the contraction amount by the first contraction process executed immediately before and the contraction amount by the second contraction process as a contraction parameter.
  • the contraction parameter calculation processing unit 315 calculates the first contraction parameter and the second contraction parameter using the contraction specific gravity calculated by the learning / evaluation control unit 310.
  • the contraction parameter calculation processing unit 315 outputs the calculated first contraction parameter and the second contraction parameter to the first contraction parameter reception unit 20 and the second contraction parameter reception unit 40, respectively.
  • the first contraction parameter and the second contraction parameter are automatically calculated using the contraction specific gravity, so that the user can use the first contraction parameter and the second contraction parameter except for the initial value. 2
  • the reduction parameters are not entered directly.
  • FIG. 14 is a flow chart showing an outline of the model contraction method according to the third embodiment of the present invention.
  • FIG. 14 is similar to FIG. 11 and has the same reference numerals to the corresponding steps.
  • the learning / evaluation control unit 310 calculates the contraction specific gravity and outputs the contraction specific gravity to the contraction parameter calculation processing unit 315.
  • step S320 the contraction parameter calculation processing unit 315 calculates the first contraction parameter and the second contraction parameter using the input contraction specific gravity.
  • the calculated first contraction parameter is output to the weight calculation processing unit 30 via the first contraction parameter reception unit 20.
  • the calculated second contraction parameter is output to the weight calculation processing unit 30 and the network transformation / resynthesis processing unit 50 via the second contraction parameter reception unit 40.
  • step S360 when the learning / evaluation control unit 310 determines that the model contraction process is to be continued (Yes), it returns to step S10 and calculates the contraction parameter (reduction specific gravity). On the other hand, when the learning / evaluation control unit 310 determines that the model contraction process is not continued (No), the model contraction process ends.
  • the parameter to be updated is only the contraction specific gravity, so that the number of loops can be reduced and the design period related to the model contraction can be increased. It becomes possible to suppress it.
  • the present invention is not limited to the above-described embodiment, and includes various modifications. It is also possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. .. Further, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration. It should be noted that each member and the relative size described in the drawings are simplified and idealized in order to explain the present invention in an easy-to-understand manner, and may have a more complicated shape in mounting.

Abstract

This model reduction device 1 of a neural network is provided with: a weight calculation processing unit (first reduction processing unit) 30 which performs a first reduction process that changes elements of a weight matrix by using first reduction parameters, and updates the weight matrix; and a network deformation/recombination processing unit (second reduction processing unit) 50 which performs a second reduction process that reduces the size of the updated weight matrix by using second reduction parameters, and deforms the shape of the network in association with the reduced weight matrix.

Description

ニューラルネットワークのモデル縮約装置Neural network model contraction device
 本発明は、ニューラルネットワークのモデル縮約装置に関する。 The present invention relates to a model contraction device for a neural network.
 近年、人工知能を搭載した機器の利用が進んでいる。ニューラルネットワークを用いた人工知能では、認識精度を高めることができるようモデル設計がなされており、高精度なニューラルネットワークほどモデルの規模が大きい。ただし、大規模なニューラルネットワークのモデルは、冗長な部分を多く含んでいる。組み込みデバイス上でニューラルネットワークによる推論計算を行う場合、取り扱うことができるデータ容量や消費電力に限界がある。このため、認識精度を保持しつつ、冗長な部分を削減する技術が開発されている。 In recent years, the use of devices equipped with artificial intelligence has been advancing. In artificial intelligence using a neural network, the model is designed so that the recognition accuracy can be improved, and the higher the accuracy of the neural network, the larger the model scale. However, the model of a large-scale neural network contains many redundant parts. When performing inference calculations using a neural network on an embedded device, there is a limit to the amount of data that can be handled and the power consumption. For this reason, a technique has been developed that reduces redundant parts while maintaining recognition accuracy.
 例えば、特許文献1には、ニューラルネットワークの性能が所定要求を満たすことを確保すると共に、行列の次元を削減することで行列乗算の計算量を低減でき、ニューラルネットワークの処理速度を向上できる、ニューラルネットワークの装置が開示されている。 For example, in Patent Document 1, it is possible to ensure that the performance of a neural network satisfies a predetermined requirement, reduce the amount of calculation of matrix multiplication by reducing the dimension of the matrix, and improve the processing speed of the neural network. The devices of the network are disclosed.
 一方、ニューラルネットワークの推論結果への感度が低い重みに関する演算を省略することにより、演算量を削減する手法も開発されている。この手法には、例えば、ニューラルネットワークの構造のスパース性、行列濃度(cardinality)、あるいは行列特性(特異値、主成分、固有値等の行列特性)等、ニューラルネットワークのモデルの性質を表現する上で互いに独立した指標を縮約基準に用いる手法が含まれる。これらの手法は、例えば、ニューロン刈り、シナプス刈り、低ランク近似等と呼ばれる手法である。これらの手法は、前述した演算量を削減する手法と本質的には同じアプローチにより行われる手法である。 On the other hand, a method for reducing the amount of calculation has also been developed by omitting the calculation related to the weight having low sensitivity to the inference result of the neural network. This method is used to express the properties of the neural network model, such as the sparseness of the neural network structure, the cardinality, or the matrix characteristics (matrix characteristics such as singular values, principal components, and eigenvalues). It includes methods that use mutually independent indicators as reduction criteria. These methods are, for example, methods called neuron cutting, synaptic cutting, low-rank approximation, and the like. These methods are essentially the same approaches as the methods for reducing the amount of calculation described above.
 また、一種類の縮約基準に基づく方法ではモデルの縮約規模に限界があるため、異なる手法を逐次的に組み合わせて適用することで、演算量を大きく削減する技術が非特許文献1に開示されている。 In addition, since there is a limit to the scale of contraction of a model in a method based on one type of contraction standard, a technique for significantly reducing the amount of calculation by sequentially combining and applying different methods is disclosed in Non-Patent Document 1. Has been done.
特開2018-109947号公報JP-A-2018-109847
 しかしながら、単純に異なる縮約方法を逐次的に組み合わせても、必然的に縮約方法の数だけループ箇所が増加するため、モデル縮約に係る設計手続きが複雑化し、設計期間が増大してしまう。 However, even if different contraction methods are simply combined sequentially, the number of loop points inevitably increases by the number of contraction methods, which complicates the design procedure related to model contraction and increases the design period. ..
 そこで、本発明は、複数の縮約方法を組み合わせつつ、モデル縮約に係る設計期間の増大を抑えることが可能なニューラルネットワークのモデル縮約装置等を提供することを目的とする。 Therefore, an object of the present invention is to provide a neural network model contraction device or the like capable of suppressing an increase in the design period related to model contraction while combining a plurality of contraction methods.
 本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 A brief outline of typical inventions disclosed in the present application is as follows.
 本発明の代表的な実施の形態によるニューラルネットワークのモデル縮約装置は、第1縮約パラメータを用いて、重み行列の要素を変更する第1縮約処理を行い、重み行列を更新する第1縮約処理部と、第2縮約パラメータを用いて、更新された重み行列のサイズを縮小する第2縮約処理を行い、縮小した重み行列に対応させてネットワーク形状を変形する第2縮約処理部と、を備えている。 The model contraction device for the neural network according to the typical embodiment of the present invention uses the first contraction parameter to perform the first contraction process for changing the elements of the weight matrix, and updates the weight matrix. The second contraction process that reduces the size of the updated weight matrix using the contraction processing unit and the second contraction parameter is performed, and the network shape is deformed according to the reduced weight matrix. It is equipped with a processing unit.
 本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, the effects obtained by representative ones are briefly described as follows.
 すなわち、本発明の代表的な実施の形態によれば、複数の縮約方法を組み合わせつつ、モデル縮約に係る設計期間の増大を抑えることが可能となる。 That is, according to a typical embodiment of the present invention, it is possible to suppress an increase in the design period related to model contraction while combining a plurality of contraction methods.
一般的なニューラルネットワークの全体像を説明する図である。It is a figure explaining the whole image of a general neural network. ニューラルネットワークの重み行列を説明する図である。It is a figure explaining the weight matrix of a neural network. 本発明の実施の形態1に係るニューラルネットワークのモデル縮約装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the model contraction apparatus of the neural network which concerns on Embodiment 1 of this invention. 本発明の実施の形態1に係るモデル縮約方法の概要を示すフロー図である。It is a flow chart which shows the outline of the model contraction method which concerns on Embodiment 1 of this invention. モデル縮約方法の一例を示すフロー図である。It is a flow chart which shows an example of a model contraction method. ニューロン刈り、シナプス刈りによるモデル縮約の説明図である。It is explanatory drawing of model contraction by neuron mowing and synaptic mowing. 量子化プルーニングの実行手順を例示する図である。It is a figure which illustrates the execution procedure of the quantization pruning. 量子化プルーニングの実行手順のその他の例を示す説明図である。It is explanatory drawing which shows the other example of the execution procedure of quantization pruning. モデル縮約方法の一例を示すフロー図である。It is a flow chart which shows an example of a model contraction method. 本発明の実施の形態2に係るニューラルネットワークのモデル縮約装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the model contraction apparatus of the neural network which concerns on Embodiment 2 of this invention. 本発明の実施の形態2に係るモデル縮約方法の概要を示すフロー図である。It is a flow chart which shows the outline of the model contraction method which concerns on Embodiment 2 of this invention. 実施の形態2におけるモデル縮約方法の一例を示すフロー図である。It is a flow chart which shows an example of the model contraction method in Embodiment 2. 本発明の実施の形態3に係るニューラルネットワークのモデル縮約装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the model contraction apparatus of the neural network which concerns on Embodiment 3 of this invention. 本発明の実施の形態3に係るモデル縮約方法の概要を示すフロー図である。It is a flow chart which shows the outline of the model contraction method which concerns on Embodiment 3 of this invention.
 以下、本発明の実施の形態を、図面を参照しつつ説明する。以下で説明する各実施の形態は、本発明を実現するための一例であり、本発明の技術範囲を限定するものではない。なお、実施例において、同一の機能を有する部材には同一の符号を付し、その繰り返しの説明は、特に必要な場合を除き省略する。
 (実施の形態1)
 <ニューラルネットワークの全体像>
Hereinafter, embodiments of the present invention will be described with reference to the drawings. Each embodiment described below is an example for realizing the present invention, and does not limit the technical scope of the present invention. In the examples, members having the same function are designated by the same reference numerals, and the repeated description thereof will be omitted unless particularly necessary.
(Embodiment 1)
<Overview of neural network>
 まず、ニューラルネットワークの全体像について説明する。図1は、一般的なニューラルネットワークの全体像を説明する図である。ここでは、ニューラルネットワークの具体例として、静止画に対する畳み込みニューラルネットワークを挙げる。また、図1では、入力データの静止画を、畳み込みニューラルネットワークにより、ユーザが定義した各クラスに分類することを目的とするものとする。 First, the overall picture of the neural network will be explained. FIG. 1 is a diagram illustrating an overall picture of a general neural network. Here, as a specific example of the neural network, a convolutional neural network for a still image is given. Further, in FIG. 1, it is intended that the still image of the input data is classified into each class defined by the user by the convolutional neural network.
 図1のニューラルネットワークは、入力層L10、隠れ層L20~L90、及び出力層L100を有する。これらの層は、例えば、ソフトウェアを実行することによりプロセッサ等の演算装置上に実現される。 The neural network of FIG. 1 has an input layer L10, hidden layers L20 to L90, and an output layer L100. These layers are realized on an arithmetic unit such as a processor by executing software, for example.
 入力層L10から最深層の隠れ層L90に到達するまでの間には、畳み込み演算層、全結合演算層、プーリング演算層等の各層がある。ニューラルネットワークには、ここで例示した各層が複数含まれてもよい。なお、各層の符号は便宜的に付したものであり、ニューラルネットワークの層数を示唆するものではない。 From the input layer L10 to the deepest hidden layer L90, there are layers such as a convolution calculation layer, a fully connected calculation layer, and a pooling calculation layer. The neural network may include a plurality of each layer illustrated here. The code of each layer is added for convenience, and does not indicate the number of layers of the neural network.
 入力層L10における演算処理の前段階では、演算装置は、入力画像をニューラルネットワークの演算に適した画像データに変形する。そして、演算装置は、変形した画像データを記憶部へ格納しておく。 In the pre-stage of the arithmetic processing in the input layer L10, the arithmetic unit transforms the input image into image data suitable for the arithmetic of the neural network. Then, the arithmetic unit stores the deformed image data in the storage unit.
 入力層L10等の各層は、それぞれに対応する重み行列をそれぞれ所持している。入力層L10は、変形した画像データに対し、重み行列を用いた畳み込み演算を実行する。詳細に述べる、入力層L10は、畳み込み演算以外にも、バッチ処理やバイアス項に関する処理等を実行する。これにより、画像データは、特徴マップの状態に遷移する。続いて、最浅層の隠れ層L20は、特徴マップに対する演算処理を実行する。そして、各層における処理が実行された後、出力層L100は、例えば出力関数を用いて、入力画像に対して分類されるべき各クラスの確率分布を演算する。そして、出力層L100は、入力画像に対するクラス分類結果を出力する。そして、演算装置は、入力画像が、何の画像であるかの判定を行う。 Each layer such as the input layer L10 has a weight matrix corresponding to each layer. The input layer L10 executes a convolution operation using a weight matrix on the deformed image data. The input layer L10, which will be described in detail, executes batch processing, processing related to the bias term, and the like in addition to the convolution operation. As a result, the image data transitions to the state of the feature map. Subsequently, the shallowest hidden layer L20 executes arithmetic processing on the feature map. Then, after the processing in each layer is executed, the output layer L100 calculates the probability distribution of each class to be classified for the input image by using, for example, an output function. Then, the output layer L100 outputs the classification result for the input image. Then, the arithmetic unit determines what kind of image the input image is.
 ここで、1次元ニューラルネットワークを例にして、重み行列について説明する。図2は、ニューラルネットワークの重み行列を説明する図である。図2(a)は、重み行列を用いた演算を説明する図であり、図2(b)は、重み行列を例示する図である。図2(a)では、矢印の方向に演算が進むものとする。図2(a)のN1、N3に含まれる各ニューロンは、入力に対する所定の値を返す演算ユニットである。S2に含まれる各シナプスには、重みが割り当てられている。前層のN1の各ニューロンからの戻り値と、対応する各シナプスの重みとの積が、次層のN3の各ニューロンに入力され和算される。このような演算処理は、行列演算で表すことができ、各シナプスの重みを行列形式でまとめたものが、図2(b)の重み行列Wとして表される。なお、重み行列は、各層によって行列のサイズや重みを示す要素が異なる。 Here, the weight matrix will be described using a one-dimensional neural network as an example. FIG. 2 is a diagram illustrating a weight matrix of a neural network. FIG. 2A is a diagram for explaining an operation using a weight matrix, and FIG. 2B is a diagram illustrating a weight matrix. In FIG. 2A, it is assumed that the calculation proceeds in the direction of the arrow. Each neuron included in N1 and N3 in FIG. 2A is an arithmetic unit that returns a predetermined value for an input. Weights are assigned to each synapse included in S2. The product of the return value from each neuron of N1 in the previous layer and the weight of each corresponding synapse is input to each neuron of N3 in the next layer and summed. Such arithmetic processing can be represented by a matrix operation, and the weights of each synapse summarized in a matrix format are represented as the weight matrix W in FIG. 2B. The weight matrix has different elements that indicate the size and weight of the matrix depending on each layer.
 なお、以下では、重み行列に対し記号「W」を用いているが、特に、解析的更新が行われた重み行列に対しては、「W」にチルダ「~」のアクセントをつけた記号を用いることとする。 In the following, the symbol "W" is used for the weight matrix, but in particular, for the weight matrix that has been analytically updated, the symbol "W" with a tilde "~" accent is used. It will be used.
 ここでは、静止画の畳み込みニューラルネットワークを一例として挙げたが、入力データは、静止画に限定されるものではない。ニューラルネットワークは、前述した画像認識、音声認識、自然言語処理、温度、湿度、及び流体の流量を認識することによる周辺環境の認識等を行うことができる。 Here, the convolutional neural network of still images is given as an example, but the input data is not limited to still images. The neural network can perform the above-mentioned image recognition, voice recognition, natural language processing, temperature, humidity, and recognition of the surrounding environment by recognizing the flow rate of the fluid.
 また、ニューラルネットワークの種類も、畳み込みニューラルネットワークに限定されるものではなく、行列形式で定義できる演算であれば適用可能である。また、出力層L100における出力値も、単にクラス分類に限定されるものではなく、物体検出結果や音声認識結果など、ユーザの目的に合わせて変更可能である。 Also, the type of neural network is not limited to convolutional neural networks, and can be applied as long as it is an operation that can be defined in a matrix format. Further, the output value in the output layer L100 is not limited to the classification, but can be changed according to the user's purpose such as the object detection result and the voice recognition result.
 ニューラルネットワークのモデル(以下、単に「モデル」と呼ぶ場合がある)は、ネットワーク形状、及びニューラルネットワークの各層における重み行列を含む。詳しくは後述するが、重み行列は、ユーザが設定する所定の認識精度を満たすように、学習によって最適化される。
 <モデル縮約装置の構成>
A model of a neural network (hereinafter, may be simply referred to as a "model") includes a network shape and a weight matrix in each layer of the neural network. As will be described in detail later, the weight matrix is optimized by learning so as to satisfy a predetermined recognition accuracy set by the user.
<Configuration of model contraction device>
 次に、モデル縮約装置の構成について説明する。モデル縮約装置1は、ニューラルネットワークのモデル縮約を行う機能ブロックである。以下に説明するモデル縮約処理は、図1の入力層L10~最深の隠れ層L90に対して行われる。 Next, the configuration of the model contraction device will be described. The model contraction device 1 is a functional block that performs model contraction of a neural network. The model contraction process described below is performed on the input layer L10 to the deepest hidden layer L90 in FIG.
 図3は、本発明の実施の形態1に係るニューラルネットワークのモデル縮約装置の構成の一例を示すブロック図である。図3に示すように、モデル縮約装置1は、学習・評価制御部(制御部)10、第1縮約パラメータ受付部20、重み算出処理部(第1縮約処理部)30、第2縮約パラメータ受付部40、ネットワーク変形・再合成処理部(第2縮約処理部)50、重み行列記憶部60、学習処理部70、推論精度評価部80等を備えている。 FIG. 3 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the first embodiment of the present invention. As shown in FIG. 3, the model contraction device 1 includes a learning / evaluation control unit (control unit) 10, a first contraction parameter reception unit 20, a weight calculation processing unit (first contraction processing unit) 30, and a second. It includes a contraction parameter reception unit 40, a network transformation / resynthesis processing unit (second contraction processing unit) 50, a weight matrix storage unit 60, a learning processing unit 70, an inference accuracy evaluation unit 80, and the like.
 重み行列記憶部60以外のモデル縮約装置1を構成する各機能ブロックは、例えばプロセッサがソフトウェアを実行することで実現される。また、各機能ブロックは、ハードウェアで実現されてもよいし、ハードウェアとソフトウェアとの連携により実現されてもよい。 Each functional block constituting the model contraction device 1 other than the weight matrix storage unit 60 is realized, for example, by the processor executing software. Further, each functional block may be realized by hardware, or may be realized by cooperation between hardware and software.
 図3に示すように、モデル縮約装置1には、モデル記憶部100、学習用データ記憶部71、推論用データ記憶部81、入力処理部90等が接続されている。なお、モデル記憶部100、学習用データ記憶部71、及び推論用データ記憶部81は、それぞれ独立して設けられてもよいし、これらと一体で構成されてもよい。また、モデル縮約装置1内の重み行列記憶部60も、これらの記憶部と独立していてもよいし、これらと一体で構成されてもよい。重み行列記憶部60は、モデル縮約装置1の外に設けられてもよい。 As shown in FIG. 3, a model storage unit 100, a learning data storage unit 71, an inference data storage unit 81, an input processing unit 90, and the like are connected to the model contraction device 1. The model storage unit 100, the learning data storage unit 71, and the inference data storage unit 81 may be provided independently of each other, or may be integrally configured with these. Further, the weight matrix storage unit 60 in the model contraction device 1 may also be independent of these storage units, or may be integrally configured with these storage units. The weight matrix storage unit 60 may be provided outside the model contraction device 1.
 入力処理部90は、モデル縮約装置1において実行される縮約手法を指定する機能ブロックである。入力処理部90は、例えば、入力インタフェースと接続され、入力インタフェースを介してユーザが選択した複数(例えば2種類)の縮約手法を学習・評価制御部10へ通知する。また、入力処理部90は、ユーザが選択した縮約手法に対応する縮約パラメータが入力された場合には、入力された縮約パラメータを学習・評価制御部10へ出力する。縮約パラメータは、モデルの縮約率を決定するパラメータである。縮約パラメータは、例えば、縮約率の閾値、縮約率の刻み幅、演算量削減率等であり、ユーザが選択した縮約手法に合ったものが適宜設定される。 The input processing unit 90 is a functional block that specifies the contraction method executed in the model contraction device 1. The input processing unit 90 is connected to, for example, an input interface, and notifies the learning / evaluation control unit 10 of a plurality of (for example, two types) reduction methods selected by the user via the input interface. Further, when the reduction parameter corresponding to the reduction method selected by the user is input, the input processing unit 90 outputs the input reduction parameter to the learning / evaluation control unit 10. The contraction parameter is a parameter that determines the contraction rate of the model. The contraction parameters are, for example, a threshold value of the contraction rate, a step size of the contraction rate, a calculation amount reduction rate, and the like, and those suitable for the contraction method selected by the user are appropriately set.
 モデル記憶部100は、ニューラルネットワークのモデルを格納する記憶媒体である。図3に示すように、モデル記憶部100は、重み行列を格納する重み行列記憶部101、ネットワーク形状を格納するネットワーク形状記憶部102を備えている。モデル縮約処理により、重み行列及びネットワーク形状は適宜更新されるが、モデル記憶部100は、更新された重み行列及びネットワーク形状のみを格納してもよいし、更新前後の重み行列及びネットワーク形状をそれぞれ格納してもよい。 The model storage unit 100 is a storage medium for storing the model of the neural network. As shown in FIG. 3, the model storage unit 100 includes a weight matrix storage unit 101 that stores a weight matrix and a network shape storage unit 102 that stores a network shape. Although the weight matrix and network shape are appropriately updated by the model contraction process, the model storage unit 100 may store only the updated weight matrix and network shape, or stores the weight matrix and network shape before and after the update. Each may be stored.
 学習・評価制御部10は、重み行列の学習・評価に関する処理や、モデル縮約に関する処理の制御等を行う機能ブロックである。モデル縮約に関する処理として、学習・評価制御部10は、入力処理部90から通知された2種類の縮約手法の割り当てを行う。例えば、学習・評価制御部10は、重み算出処理部30で実行される縮約を第1縮約に割り当て、ネットワーク変形・再合成処理部50で実行される縮約を第2縮約に割り当てる。そして、学習・評価制御部10は、第1縮約、第2縮約に対応する縮約パラメータ(第1縮約パラメータ、第2縮約パラメータ)をそれぞれ設定する。学習・評価制御部10は、第1縮約として設定された縮約手法を通知する第1縮約通知、及び第1縮約に対応する第1縮約パラメータを第1縮約パラメータ受付部20へ出力する。一方、学習・評価制御部10は、第2縮約として設定された縮約手法を通知する第2縮約通知、及び第2縮約に対応する第2縮約パラメータを第2縮約パラメータ受付部40へ出力する。 The learning / evaluation control unit 10 is a functional block that controls processing related to learning / evaluation of a weight matrix, processing related to model contraction, and the like. As a process related to model contraction, the learning / evaluation control unit 10 assigns two types of contraction methods notified from the input processing unit 90. For example, the learning / evaluation control unit 10 assigns the contraction executed by the weight calculation processing unit 30 to the first contraction and the contraction executed by the network transformation / resynthesis processing unit 50 to the second contraction. .. Then, the learning / evaluation control unit 10 sets the contraction parameters (first contraction parameter, second contraction parameter) corresponding to the first contraction and the second contraction, respectively. The learning / evaluation control unit 10 sets the first contraction notification for notifying the contraction method set as the first contraction, and the first contraction parameter reception unit 20 for the first contraction parameter corresponding to the first contraction. Output to. On the other hand, the learning / evaluation control unit 10 receives the second contraction notification for notifying the contraction method set as the second contraction and the second contraction parameter corresponding to the second contraction. Output to unit 40.
 また、学習・評価制御部10は、重み行列の学習・評価に関する処理として、学習処理部70や推論精度評価部80によるモデル縮約後の重み行列の評価結果に基づき、モデル縮約処理を継続するか否かの判定を行う。モデル縮約処理を継続する場合、学習・評価制御部10は、縮約パラメータを再設定し、モデル縮約処理を継続させる。 Further, the learning / evaluation control unit 10 continues the model reduction processing based on the evaluation result of the weight matrix after the model reduction by the learning processing unit 70 and the inference accuracy evaluation unit 80 as the processing related to the learning / evaluation of the weight matrix. Judge whether or not to do so. When continuing the model contraction process, the learning / evaluation control unit 10 resets the contraction parameters and continues the model contraction process.
 第1縮約パラメータ受付部20は、学習・評価制御部10から入力された第1縮約通知、及び第1縮約パラメータを重み算出処理部30へ出力する。 The first contraction parameter receiving unit 20 outputs the first contraction notification and the first contraction parameter input from the learning / evaluation control unit 10 to the weight calculation processing unit 30.
 重み算出処理部30は、第1縮約パラメータ受付部20から入力された第1縮約通知、第1縮約パラメータに基づき、重み行列記憶部101に格納された重み行列に対する第1縮約処理を行う機能ブロックである。第1縮約処理において、重み算出処理部30は、重み行列のサイズを変更することなく各要素を更新することにより、重み行列を更新する。重み算出処理部30における重み行列の更新は、学習等の最適化により行われるのではなく、解析的に行われる。また、行列のサイズが変わらないので、第1縮約においては、ネットワークの形状も変わらない。重み算出処理部30は、更新した重み行列を重み行列記憶部60に格納する。 The weight calculation processing unit 30 performs the first contraction processing for the weight matrix stored in the weight matrix storage unit 101 based on the first contraction notification and the first contraction parameter input from the first contraction parameter reception unit 20. It is a functional block that performs. In the first contraction process, the weight calculation processing unit 30 updates the weight matrix by updating each element without changing the size of the weight matrix. The update of the weight matrix in the weight calculation processing unit 30 is not performed by optimization such as learning, but is performed analytically. Moreover, since the size of the matrix does not change, the shape of the network does not change in the first contraction. The weight calculation processing unit 30 stores the updated weight matrix in the weight matrix storage unit 60.
 第2縮約パラメータ受付部40は、学習・評価制御部10から入力された第2縮約通知、及び第2縮約パラメータをネットワーク変形・再合成処理部50へ出力する。 The second contraction parameter reception unit 40 outputs the second contraction notification input from the learning / evaluation control unit 10 and the second contraction parameter to the network transformation / resynthesis processing unit 50.
 ネットワーク変形・再合成処理部50は、第2縮約パラメータ受付部40から入力された第2縮約通知、第2縮約パラメータに基づき、ネットワークの変形や再合成に関する処理を制御する機能ブロックである。ネットワーク変形・再合成処理部50は、重み算出処理部30で更新された重み行列に対する対する第2縮約を行い、重み行列のサイズを縮小し、重み行列を変形する。そして、ネットワーク変形・再合成処理部50は、変形した重み行列に基づきネットワークの再合成を行い、ネットワーク形状を更新する。ネットワーク変形・再合成処理部50は、更新したネットワーク形状をネットワーク形状記憶部102に格納する。 The network transformation / resynthesis processing unit 50 is a functional block that controls processing related to network transformation and resynthesis based on the second contraction notification and the second contraction parameter input from the second contraction parameter reception unit 40. is there. The network transformation / resynthesis processing unit 50 performs the second reduction with respect to the weight matrix updated by the weight calculation processing unit 30, reduces the size of the weight matrix, and transforms the weight matrix. Then, the network deformation / resynthesis processing unit 50 resynthesizes the network based on the deformed weight matrix and updates the network shape. The network transformation / resynthesis processing unit 50 stores the updated network shape in the network shape storage unit 102.
 学習用データ記憶部71は、重み行列に対する学習処理を行うための学習データ等を格納する。学習処理部70は、学習用データ記憶部71の学習データを用いて、重み行列記憶部101に格納された重み行列に対する学習処理を行う機能ブロックである。学習処理部70は、学習データを基に、重み行列を用いた解析処理を実行させる。そして、学習処理部70は、解析結果と、学習データとを比較して、重み行列における各要素の重みを最適化し、重み行列の更新を行う。 The learning data storage unit 71 stores learning data and the like for performing learning processing on the weight matrix. The learning processing unit 70 is a functional block that performs learning processing on the weight matrix stored in the weight matrix storage unit 101 by using the learning data of the learning data storage unit 71. The learning processing unit 70 executes analysis processing using a weight matrix based on the learning data. Then, the learning processing unit 70 compares the analysis result with the learning data, optimizes the weight of each element in the weight matrix, and updates the weight matrix.
 重み行列記憶部101に格納された重み行列には、後述するように、推論精度評価部80による推論精度評価が行われる。推論精度評価結果は、学習・評価制御部10に入力され、学習処理の際、学習処理部70は、推論精度評価結果に基づく制御を学習・評価制御部10から受ける。 The inference accuracy evaluation by the inference accuracy evaluation unit 80 is performed on the weight matrix stored in the weight matrix storage unit 101, as will be described later. The inference accuracy evaluation result is input to the learning / evaluation control unit 10, and during the learning process, the learning processing unit 70 receives control based on the inference accuracy evaluation result from the learning / evaluation control unit 10.
 推論用データ記憶部81は、重み行列に対する推論精度評価に用いられる推論精度評価用データを格納する。推論精度評価部80は、推論精度評価用データをテストデータとして、重み行列記憶部101に格納された重み行列に対する推論精度評価を行う。推論精度評価部80は、推論精度評価結果を学習・評価制御部10へ出力する。
 <モデル縮約方法>
The inference data storage unit 81 stores inference accuracy evaluation data used for inference accuracy evaluation for the weight matrix. The inference accuracy evaluation unit 80 evaluates the inference accuracy of the weight matrix stored in the weight matrix storage unit 101 using the inference accuracy evaluation data as test data. The inference accuracy evaluation unit 80 outputs the inference accuracy evaluation result to the learning / evaluation control unit 10.
<Model contraction method>
 次に、本実施の形態におけるモデル縮約方法について説明する。図4は、本発明の実施の形態1に係るモデル縮約方法の概要を示すフロー図である。図4のフローには、ステップS10~S60が含まれる。 Next, the model contraction method in the present embodiment will be described. FIG. 4 is a flow chart showing an outline of the model contraction method according to the first embodiment of the present invention. The flow of FIG. 4 includes steps S10 to S60.
 モデル縮約処理が開始されると(START)、第1縮約処理に用いられる第1縮約パラメータの設定が行われる(ステップS10)。学習・評価制御部10は、例えば、第1縮約に対する縮約パラメータの初期値を不揮発性メモリ(図示は省略)から読み出し、読み出した初期値を第1縮約パラメータに設定する。そして、学習・評価制御部10は、設定した第1縮約パラメータを第1縮約パラメータ受付部20へ出力する。また、学習・評価制御部10は、これ以外にも、ユーザにより設定された値を第1縮約パラメータに設定してもよい。 When the model contraction process is started (START), the first contraction parameter used for the first contraction process is set (step S10). For example, the learning / evaluation control unit 10 reads the initial value of the contraction parameter for the first contraction from the non-volatile memory (not shown), and sets the read initial value as the first contraction parameter. Then, the learning / evaluation control unit 10 outputs the set first contraction parameter to the first contraction parameter reception unit 20. In addition, the learning / evaluation control unit 10 may set a value set by the user as the first contraction parameter.
 そして、重み算出処理部30は、ステップS10において設定された第1縮約パラメータを用いた第1縮約処理を行い、重み行列の各要素の変更を行う(ステップS20)。重み算出処理部30は、変更した各要素からなる重み行列を生成し、重み行列を更新する。重み算出処理部30は、更新した重み行列を重み行列記憶部60へ格納する。 Then, the weight calculation processing unit 30 performs the first contraction processing using the first contraction parameter set in step S10, and changes each element of the weight matrix (step S20). The weight calculation processing unit 30 generates a weight matrix composed of each changed element and updates the weight matrix. The weight calculation processing unit 30 stores the updated weight matrix in the weight matrix storage unit 60.
 次に、第2縮約処理に用いられる第2縮約パラメータの設定が行われる(ステップS30)。学習・評価制御部10は、例えば、第2縮約に対する縮約パラメータの初期値を不揮発性メモリ(図示は省略)から読み出し、読み出した初期値を第2縮約パラメータに設定する。そして、学習・評価制御部10は、設定した第2縮約パラメータを第2縮約パラメータ受付部40へ出力する。また、学習・評価制御部10は、これ以外にも、ユーザにより設定された値を第2縮約パラメータに設定してもよい。 Next, the second contraction parameter used for the second contraction process is set (step S30). For example, the learning / evaluation control unit 10 reads the initial value of the contraction parameter for the second contraction from the non-volatile memory (not shown), and sets the read initial value as the second contraction parameter. Then, the learning / evaluation control unit 10 outputs the set second contraction parameter to the second contraction parameter reception unit 40. In addition to this, the learning / evaluation control unit 10 may set a value set by the user as the second contraction parameter.
 そして、ネットワーク変型・再合成処理部50は、ステップS30において設定された第2縮約パラメータを用いた第2縮約処理を行い、重み行列記憶部60へ格納された、更新された重み行列のサイズを縮小する(ステップS40)。さらに、ネットワーク変型・再合成処理部50は、ネットワーク形状記憶部102に格納されたネットワーク形状を読み出し、読み出したネットワーク形状を重み行列に対応させて変形し、再結合する。ネットワーク変型・再合成処理部50は、縮小した重み行列をモデル記憶部100の重み行列記憶部101に格納し、変形したネットワーク形状をネットワーク形状記憶部102に格納する。 Then, the network modification / resynthesis processing unit 50 performs the second contraction processing using the second contraction parameter set in step S30, and the updated weight matrix stored in the weight matrix storage unit 60 Reduce the size (step S40). Further, the network transformation / resynthesis processing unit 50 reads out the network shape stored in the network shape storage unit 102, deforms the read network shape according to the weight matrix, and rejoins the network shape. The network transformation / resynthesis processing unit 50 stores the reduced weight matrix in the weight matrix storage unit 101 of the model storage unit 100, and stores the deformed network shape in the network shape storage unit 102.
 学習処理部70は、学習用データ記憶部71から学習用データを読み出し、縮小された重み行列に対する学習処理を行う(ステップS50)。具体的に述べると、学習処理部70は、学習用データを入力データとし、重み行列記憶部101に格納された縮小された重み行列を用いた演算処理を実行させる。学習処理部70は、出力層L100のデータ又は出力層L100から出力されるデータを演算結果とし、入力データと、演算結果とを比較して重み行列の最適化を行う。また、学習処理部70は、重み行列に対する学習結果を学習・評価制御部10へ出力する。 The learning processing unit 70 reads the learning data from the learning data storage unit 71, and performs learning processing on the reduced weight matrix (step S50). Specifically, the learning processing unit 70 uses the learning data as input data and executes arithmetic processing using the reduced weight matrix stored in the weight matrix storage unit 101. The learning processing unit 70 uses the data of the output layer L100 or the data output from the output layer L100 as the calculation result, compares the input data with the calculation result, and optimizes the weight matrix. Further, the learning processing unit 70 outputs the learning result for the weight matrix to the learning / evaluation control unit 10.
 また、並行して、推論精度評価部80は、推論用データ記憶部81から推論精度評価用データを読み出し、縮小された重み行列に対する推論精度評価処理を行う。具体的に述べると、推論精度評価部80は、推論精度評価用データを入力データとし、重み行列記憶部101に格納された縮小された重み行列を用いた演算処理を実行させる。推論精度評価部80は、出力層L100のデータ又は出力層L100から出力されるデータを演算結果とし、推論精度評価用データと、演算結果とを比較して推論精度評価を行い、推論精度評価結果を学習・評価制御部10へ出力する。推論精度評価部80による推論精度評価は、所定の間隔で複数回行われる。 In parallel, the inference accuracy evaluation unit 80 reads the inference accuracy evaluation data from the inference data storage unit 81, and performs inference accuracy evaluation processing on the reduced weight matrix. Specifically, the inference accuracy evaluation unit 80 uses the inference accuracy evaluation data as input data and executes arithmetic processing using the reduced weight matrix stored in the weight matrix storage unit 101. The inference accuracy evaluation unit 80 uses the data of the output layer L100 or the data output from the output layer L100 as the calculation result, compares the data for inference accuracy evaluation with the calculation result, performs inference accuracy evaluation, and infer accuracy evaluation result. Is output to the learning / evaluation control unit 10. The inference accuracy evaluation by the inference accuracy evaluation unit 80 is performed a plurality of times at predetermined intervals.
 学習・評価制御部10は、入力された学習結果や推論精度評価結果を用いて、モデル縮約処理を継続するか終了するかの判定を行う(ステップS60)。例えば、学習・評価制御部10は、学習結果を参照し、重み行列における各要素の重みを変更すべきと判断した場合(1)、ステップS10に戻り、第1縮約パラメータ及び第2縮約パラメータの再設定を行い、再設定された縮約パラメータを用いてモデル縮約処理を継続する。 The learning / evaluation control unit 10 uses the input learning result and the inference accuracy evaluation result to determine whether to continue or end the model contraction process (step S60). For example, when the learning / evaluation control unit 10 refers to the learning result and determines that the weight of each element in the weight matrix should be changed (1), it returns to step S10 and returns to the first contraction parameter and the second contraction. The parameters are reset, and the model contraction process is continued using the reset contraction parameters.
 また、例えば、学習・評価制御部10は、重み行列における各要素の重みを変更する必要はないが、推論精度評価結果を参照し、推論精度が所定の閾値より低い又は推論精度が急激に低下したと判断した場合には(2)、ステップS30に戻り第2縮約パラメータの再設定のみを行い、再設定された縮約パラメータを用いてモデル縮約処理を継続する。これらに対し、学習・評価制御部10は、各層における重み行列における各要素の重みを変更する必要はなく、推論精度が所定の閾値より高い場合(3)、各縮約パラメータの再設定は必要ないと判断しモデル縮約処理を終了する(END)。 Further, for example, the learning / evaluation control unit 10 does not need to change the weight of each element in the weight matrix, but refers to the inference accuracy evaluation result, and the inference accuracy is lower than a predetermined threshold value or the inference accuracy drops sharply. If it is determined that this has been done, the process returns to step S30, only the second contraction parameter is reset, and the model contraction process is continued using the reset contraction parameter. On the other hand, the learning / evaluation control unit 10 does not need to change the weight of each element in the weight matrix in each layer, and when the inference accuracy is higher than a predetermined threshold value (3), it is necessary to reset each contraction parameter. It is determined that there is no model reduction process, and the model reduction process is terminated (END).
 また、これ以外にも、学習・評価制御部10は、例えば、推論精度の低下が複数回(例えば3回以上)連続して発生しているか、縮約率が目標の値に達しているかを判定項目としてモデル縮約処理を継続するか終了するかの判定を行ってもよい。具体的に述べると、推論精度の低下が3回以上連続して発生している、かつ、モデル縮約率が目標に達していないと判断した場合(1)、学習・評価制御部10は、ステップS10に戻り、第1縮約パラメータ及び第2縮約パラメータの再設定を行い、再設定された縮約パラメータを用いてモデル縮約処理を継続する。 In addition to this, the learning / evaluation control unit 10 determines, for example, whether the inference accuracy is continuously lowered a plurality of times (for example, three times or more), or whether the reduction rate has reached the target value. As a determination item, it may be determined whether to continue or end the model contraction process. Specifically, when it is determined that the inference accuracy has decreased three or more times in a row and the model reduction rate has not reached the target (1), the learning / evaluation control unit 10 determines. Returning to step S10, the first contraction parameter and the second contraction parameter are reset, and the model contraction process is continued using the reset contraction parameter.
 また、学習・評価制御部10は、推論精度の低下が3回以上連続して発生している、又は、モデル縮約率が目標に達していないと判断した場合(2)、学習・評価制御部10は、ステップS30に戻り第2縮約パラメータの再設定のみを行い、再設定された縮約パラメータを用いてモデル縮約処理を継続する。これらに対し、学習・評価制御部10は、推論精度の低下が3回以上連続して発生していない、かつ、モデル縮約率が目標に達していると判断した場合(3)各縮約パラメータの再設定は必要ないと判断しモデル縮約処理を終了する(END)。 Further, when the learning / evaluation control unit 10 determines that the inference accuracy has decreased three times or more in succession or the model reduction rate has not reached the target (2), the learning / evaluation control The unit 10 returns to step S30 and only resets the second contraction parameter, and continues the model contraction process using the reset contraction parameter. On the other hand, when the learning / evaluation control unit 10 determines that the inference accuracy has not decreased three times or more in a row and the model contraction rate has reached the target (3), each contraction It is determined that the parameter resetting is not necessary, and the model contraction process is terminated (END).
 これら以外にも、モデル縮約処理を継続するか否かの判定項目は、ユーザにより任意に設定可能である。 In addition to these, the judgment item as to whether or not to continue the model contraction processing can be arbitrarily set by the user.
 第1縮約パラメータ、第2縮約パラメータの再設定において、学習・評価制御部10は、ユーザにより入力された値を各縮約パラメータに設定してもよいし、モデル縮約率の再設定に用いる方法(数式、リスト、テーブル等)を予め指定しておくことで、第1縮約パラメータ、第2縮約パラメータの再設定を自動的に行ってもよい。 In resetting the first contraction parameter and the second contraction parameter, the learning / evaluation control unit 10 may set a value input by the user for each contraction parameter, or reset the model contraction rate. By specifying in advance the method (mathematical expression, list, table, etc.) used for, the first contraction parameter and the second contraction parameter may be automatically reset.
 なお、学習・評価制御部10は、推論精度評価結果を用いずに、モデル縮約処理を継続するか終了するかの判定を行ってもよい。
 <<モデル縮約方法の具体例(1)>>
The learning / evaluation control unit 10 may determine whether to continue or end the model contraction process without using the inference accuracy evaluation result.
<< Specific example of model contraction method (1) >>
 次に、モデル縮約方法の具体例について説明する。図5は、モデル縮約方法の一例を示すフロー図である。図5は、図4と対応しているため、図5の各ステップに対応する符号は図4と合わせている。本例では、第1縮約処理として低ランク近似、第2縮約処理としてニューロン刈りがそれぞれ割り当てられている。 Next, a specific example of the model contraction method will be described. FIG. 5 is a flow chart showing an example of the model contraction method. Since FIG. 5 corresponds to FIG. 4, the reference numerals corresponding to each step of FIG. 5 are matched with those of FIG. In this example, low-rank approximation is assigned as the first contraction process, and neuron mowing is assigned as the second contraction process.
 ステップS10において、行列ランク閾値が第1縮約パラメータとして設定される。そして、重み算出処理部30は、行列ランク閾値を用いた低ランク近似を行い、重み行列の低ランク化行列を導出する(ステップS20)。
 <<<低ランク化行列の導出方法>>>
In step S10, the matrix rank threshold is set as the first contraction parameter. Then, the weight calculation processing unit 30 performs low-rank approximation using the matrix rank threshold value, and derives a low-rank matrix of the weight matrix (step S20).
<<< Derivation method of low-ranking matrix >>>
 ここで、低ランク化行列の導出方法の例を説明する。重み算出処理部30は、次の式(1)を用いて、重み行列の特異値分解を行う。 Here, an example of a method for deriving the low rank matrix will be described. The weight calculation processing unit 30 performs singular value decomposition of the weight matrix using the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(1)において、Uは左特異ベクトル、Vは右特異ベクトルの転置、Sは特異値対角行列である。特異値対角行列Sの対角成分は、重み行列のランク数Lに対応するL個の特異値で構成される。重み算出処理部30は、情報量としての寄与度が低い成分に対応する特異値のうち、所定の閾値Dより小さい成分を「0」に置き換え、置き換えた値を用いて重み行列を再合成する。このように、重み算出処理部30は、低ランク化した重み行列を生成する。なお、閾値Dは、ランク数Lに対する割合や、特異値の大きさに対する絶対値、フロベニウスノルムに対する閾値等に基づき、ユーザが適宜設定してもよい。
 このように低ランク化された重み行列の各成分は、次に示す式(2)、(3)で表される。
In equation (1), U is the left singular vector, t V is the transpose of the right singular vector, and S is the singular value diagonal matrix. The diagonal component of the singular value diagonal matrix S is composed of L singular values corresponding to the rank number L of the weight matrix. The weight calculation processing unit 30 replaces a component smaller than a predetermined threshold value D with “0” among the singular values corresponding to the components having a low contribution as an amount of information, and resynthesizes the weight matrix using the replaced values. .. In this way, the weight calculation processing unit 30 generates a low-ranked weight matrix. The threshold value D may be appropriately set by the user based on the ratio to the number of ranks L, the absolute value with respect to the magnitude of the singular value, the threshold value with respect to the Frobenius norm, and the like.
Each component of the weight matrix lowered in this way is represented by the following equations (2) and (3).
Figure JPOXMLDOC01-appb-M000002
 以上、低ランク化行列の導出方法を説明したが、低ランク化行列は、特異値分解を行う
Figure JPOXMLDOC01-appb-M000002
The method of deriving the low-ranking matrix has been described above, but the low-ranking matrix performs singular value decomposition.
以外の方法でもよく、主成分分析、固有値分解、QR分解を行う方法等が利用可能である。ユーザは、これらの手法を適宜選択することが可能である。 Methods other than the above may be used, and methods such as principal component analysis, eigenvalue decomposition, and QR decomposition can be used. The user can appropriately select these methods.
 ここで、図5の説明に戻る。次に、ステップS30において、行列サイズの縮小割合が第2縮約パラメータとして設定される。そして、ネットワーク変型・再合成処理部50は、ステップS30で導出された低ランク化行列に対し、行列サイズの縮小割合に基づくニューロン刈りを行い、低ランク化行列のサイズを縮小する(ステップS40)。行列サイズの縮小割合の単位としては、例えばニューロンの個数や、演算量削減率等が用いられる。ユーザは、行列サイズの縮小割合の単位をこれらから適宜選択することができる。 Here, return to the explanation of FIG. Next, in step S30, the reduction ratio of the matrix size is set as the second reduction parameter. Then, the network transformation / resynthesis processing unit 50 performs neuron cutting based on the reduction ratio of the matrix size on the low-ranked matrix derived in step S30, and reduces the size of the low-ranked matrix (step S40). .. As a unit of the reduction ratio of the matrix size, for example, the number of neurons, the calculation amount reduction rate, and the like are used. The user can appropriately select the unit of the reduction ratio of the matrix size from these.
 ニューロン削除の方法としては、例えば、以下に説明する「量子化プルーニング」と呼ばれる方法が用いられる。また、これ以外にも、重み行列の要素である重みのノルムを用いた方法が採用されてもよい。具体的に述べると、この方法では、各ニューロンに入る重みのL1ノルムまたはL2ノルムを評価値とし、縮小割合の分だけ評価値の低いニューロンが削除される。これらの評価値は、例えば重み算出処理部30において算出される。 As a method of deleting neurons, for example, a method called "quantization pruning" described below is used. In addition to this, a method using a norm of weights, which is an element of a weight matrix, may be adopted. Specifically, in this method, the L1 norm or L2 norm of the weight entering each neuron is used as the evaluation value, and the neurons having a lower evaluation value by the reduction ratio are deleted. These evaluation values are calculated by, for example, the weight calculation processing unit 30.
 なお、このステップでは、重み行列はステップS20において低ランク化されているため、評価値を用いたニューロン刈りにより、行列ランクに影響を及ぼす場合があり得る。なお、評価値は、L1ノルムやL2ノルムの他にも、低ランク化重み行列の行列要素を用いたものであればよい。また、ニューロン刈りと同時にシナプス刈りを実行することも可能である。また、行列サイズの削減が目的であるため、ニューロン刈りにより重み行列の行が削除されてもよいし、列が削除されてもよい。 In this step, since the weight matrix is lowered in step S20, neuron mowing using the evaluation value may affect the matrix rank. In addition to the L1 norm and the L2 norm, the evaluation value may be any one using the matrix elements of the low rank weight matrix. It is also possible to perform synaptic mowing at the same time as neuron mowing. Further, since the purpose is to reduce the matrix size, the rows of the weight matrix may be deleted by neuron cutting, or the columns may be deleted.
 ネットワーク変型・再合成処理部50は、ニューロン刈り等により重み行列のサイズを縮小すると、縮小した重み行列に対応させて、ネットワーク形状を変形する。 When the size of the weight matrix is reduced by cutting neurons or the like, the network transformation / resynthesis processing unit 50 deforms the network shape in response to the reduced weight matrix.
 図6は、ニューロン刈り、シナプス刈りによるモデル縮約の説明図である。図6(a)は、ニューロン刈り、シナプス刈りによる縮約の具体例を示す図であり、図6(b)は、ニューロン刈り、シナプス刈りを行った後のサイズが縮小された重み行列を例示する図である。図6(a)において破線で示された箇所は、削除されたニューロン及びシナプスを示している。図6(a)に示すように、モデル縮約により、N1における中央のニューロン、及びN3における左から2番目のニューロンが削除されている。また、これにともない、削除されたニューロンと接続されるシナプスも削除されている。重み行列に対してモデル縮約が行われることで、図6(b)に示すように、重み行列のサイズが縮小される。
 <<<量子化プルーニング>>>
FIG. 6 is an explanatory diagram of model contraction by neuron mowing and synaptic mowing. FIG. 6A is a diagram showing a specific example of reduction by neuron mowing and synaptic mowing, and FIG. 6B illustrates a weight matrix whose size is reduced after neuron mowing and synaptic mowing. It is a figure to do. The part indicated by the broken line in FIG. 6A shows the deleted neurons and synapses. As shown in FIG. 6A, model contraction removes the central neuron in N1 and the second neuron from the left in N3. Along with this, synapses connected to deleted neurons have also been deleted. By performing model contraction on the weight matrix, the size of the weight matrix is reduced as shown in FIG. 6 (b).
<<< Quantization pruning >>>
 ここで、量子化プルーニングについて説明する。活性化されるニューロンが必ずしも大きい重みだけに反応しているわけではないことから、量子化プルーニングは、重みを離散的に残しながら行われる。なお、量子化プルーニングは、ニューロン刈りに限らず、シナプス刈りに対しても適用可能である。 Here, the quantization pruning will be explained. Quantization pruning is performed while leaving the weights discrete, as the activated neurons do not necessarily respond only to large weights. Quantization pruning can be applied not only to neuron pruning but also to synaptic pruning.
 図7は、量子化プルーニングの実行手順を例示する図である。図7の縦軸は、各ニューロンの評価値である。評価値としては、例えばニューロンに入る重みの総和等が用いられる。図7の横軸は、ニューロン番号である。なお、図7の例では、説明を容易にするため、評価値が大きくなるにつれてニューロン番号が大きくなるようにしている。なお、シナプス刈りの場合には、縦軸を重みとしても差し支えない。 FIG. 7 is a diagram illustrating an execution procedure of quantization pruning. The vertical axis of FIG. 7 is the evaluation value of each neuron. As the evaluation value, for example, the sum of the weights entering the neuron is used. The horizontal axis of FIG. 7 is the neuron number. In the example of FIG. 7, in order to facilitate the explanation, the neuron number increases as the evaluation value increases. In the case of synapse cutting, the vertical axis may be used as the weight.
 ステップS30において、残すニューロン個数が第2縮約パラメータとして設定されたとする。そして、重みに対してクラスタリング手法を適用し、残すニューロンの個数分だけクラスター化する。具体的に述べると、図7(a)では、各ニューロンが6つのクラスターに分類されている。図7では各クラスターにおける最大評価値を持つニューロンが残され、それ以外のニューロンは削除される。図7では、評価値が大きくなるにつれてニューロン番号が大きくなっているので、各クラスターの右端のニューロンが代表ニューロンとして残されることとなる。なお、図7(a)では、削除されるニューロンの評価値にはハッチングが付されている。ニューロン削除の後、残された代表ニューロンに対し、ニューロン番号の再付与が行われ、評価値の分布は図7(b)のように更新される。 It is assumed that the number of neurons to be left is set as the second contraction parameter in step S30. Then, a clustering method is applied to the weights, and clustering is performed for the number of remaining neurons. Specifically, in FIG. 7A, each neuron is classified into 6 clusters. In FIG. 7, the neuron having the maximum evaluation value in each cluster is left, and the other neurons are deleted. In FIG. 7, since the neuron number increases as the evaluation value increases, the neuron at the right end of each cluster is left as the representative neuron. In FIG. 7A, the evaluation values of the deleted neurons are hatched. After neuron deletion, the remaining representative neurons are reassigned with neuron numbers, and the distribution of evaluation values is updated as shown in FIG. 7 (b).
 図8は、量子化プルーニングの実行手順のその他の例を示す説明図である。図8の例では、各ニューロンをクラスターに分類するまでは、図7と同様である。そして、クラスタリング後、各クラスターにおいて、評価値が、重みの平均値である重心値に近いニューロンを代表ニューロンとして残し、その他のニューロンが削除される。そして、代表ニューロンの評価値が重心値で上書きされる。そして、ニューロン削除の後、残された代表ニューロンに対し、ニューロン番号の再付与が行われ、評価値の分布は図7(b)のように、更新される。
 <<モデル縮約方法の具体例(2)>>
FIG. 8 is an explanatory diagram showing another example of the execution procedure of the quantized pruning. The example of FIG. 8 is the same as that of FIG. 7 until each neuron is classified into a cluster. Then, after clustering, in each cluster, a neuron whose evaluation value is close to the center of gravity value, which is the average value of the weights, is left as a representative neuron, and other neurons are deleted. Then, the evaluation value of the representative neuron is overwritten with the center of gravity value. Then, after the neuron is deleted, the neuron numbers are reassigned to the remaining representative neurons, and the distribution of the evaluation values is updated as shown in FIG. 7 (b).
<< Specific example of model contraction method (2) >>
 次に、モデル縮約方法の別の具体例について説明する。図9は、モデル縮約方法の一例を示すフロー図である。図9は、図4と対応しているため、図9の各ステップに対応する符号は図4と合わせている。前述の例では、第1縮約処理として低ランク近似、第2縮約処理としてニューロン刈りがそれぞれ割り当てられていたが、本例では、第1縮約処理としてシナプス刈り、第2縮約処理として低ランク近似がそれぞれ割り当てられる。このように、本実施の形態では、縮約処理の内容を第1縮約処理と第2縮約処理との間で入れ換えても、縮約処理を実行することが可能である。 Next, another specific example of the model contraction method will be described. FIG. 9 is a flow chart showing an example of the model contraction method. Since FIG. 9 corresponds to FIG. 4, the reference numerals corresponding to each step of FIG. 9 are matched with those of FIG. In the above example, low-rank approximation was assigned as the first contraction process and neuron mowing was assigned as the second contraction process, but in this example, synapse mowing is assigned as the first contraction process and as the second contraction process. Each low-rank approximation is assigned. As described above, in the present embodiment, the contraction process can be executed even if the content of the contraction process is exchanged between the first contraction process and the second contraction process.
 ステップS10では、コネクション刈り割合が第1縮約パラメータとして設定される。そして、重み算出処理部30は、コネクション刈り割合を用いたシナプス刈りを行い重み行列の要素を変更する。これにより、重み算出処理部30は、情報量を削減した重み行列に更新する(ステップS20)。ステップS20において、重み算出処理部30は、例えば、図8で説明した重心値を用いた量子化プルーニング等の方法を用いてシナプス刈りを行う。 In step S10, the connection cutting ratio is set as the first contraction parameter. Then, the weight calculation processing unit 30 performs synapse cutting using the connection cutting ratio and changes the elements of the weight matrix. As a result, the weight calculation processing unit 30 updates the weight matrix with the reduced amount of information (step S20). In step S20, the weight calculation processing unit 30 performs synapse cutting using, for example, a method such as quantization pruning using the center of gravity value described in FIG.
 ステップS40では、例えば、低ランク割合が第2縮約パラメータとして設定される。ステップS50において、ネットワーク変形・再合成処理部50は、例えば低ランク割合を用いた低ランク近似を行い、重み行列の行列サイズを縮小する。低ランク近似には、前述した低ランク化行列の導出で説明した処理が行われる。行列サイズの縮小には、非特異値分解であるQR分解を用いた逐次実行可能な低ランク近似が効率的である。ネットワーク変形・再合成処理部50は、サイズを縮小した重み行列に対応して、ネットワーク形状を変形する。
 <本実施の形態による主な効果>
In step S40, for example, the low rank ratio is set as the second contraction parameter. In step S50, the network transformation / resynthesis processing unit 50 performs low-rank approximation using, for example, a low-rank ratio, and reduces the matrix size of the weight matrix. For the low-rank approximation, the processing described in the above-mentioned derivation of the low-ranking matrix is performed. Sequentially feasible low-rank approximation using QR decomposition, which is a non-singular value decomposition, is efficient for reducing the matrix size. The network transformation / resynthesis processing unit 50 transforms the network shape in response to the weight matrix whose size has been reduced.
<Main effects of this embodiment>
 本実施の形態によれば、第1縮約処理により更新された重み行列に対し、第2縮約処理が行われ、サイズが縮小された重み行列が生成される。この構成によれば、第1縮約処理と第2縮約処理とが連続して行われるので、複数の縮約方法を組み合わせつつ、モデル縮約に係る設計期間の増大を抑えることが可能となる。 According to the present embodiment, the weight matrix updated by the first contraction process is subjected to the second contraction process, and the weight matrix whose size is reduced is generated. According to this configuration, since the first contraction process and the second contraction process are continuously performed, it is possible to suppress an increase in the design period related to model contraction while combining a plurality of contraction methods. Become.
 また、本実施の形態によれば、サイズが縮小された重み行列に対し、学習モデルを用いて学習処理が行われる。この構成によれば、縮約処理ごとに学習処理等を行う必要がなくなるので、モデル縮約に係る設計期間の増大が抑えられる。 Further, according to the present embodiment, the learning process is performed using the learning model for the weight matrix whose size has been reduced. According to this configuration, it is not necessary to perform learning processing or the like for each contraction processing, so that an increase in the design period related to model contraction can be suppressed.
 また、本実施の形態によれば、サイズが縮小された重み行列に対し、推論精度評価用データを用いた推論精度評価が行われる。この構成によれば、縮約処理後のニューラルネットワークのモデルの精度を評価することが可能となる。 Further, according to the present embodiment, the inference accuracy evaluation using the inference accuracy evaluation data is performed on the weight matrix whose size has been reduced. According to this configuration, it is possible to evaluate the accuracy of the neural network model after the reduction process.
 また、本実施の形態によれば、学習処理による解析結果を用いて、モデル縮約処理を継続して行うか否かが判断される。この構成によれば、モデル縮約を繰り返し行うことができ、ユーザの要求に適した縮約モデルを生成することが可能となる。 Further, according to the present embodiment, it is determined whether or not the model contraction process is continuously performed by using the analysis result by the learning process. According to this configuration, model contraction can be repeated, and a contraction model suitable for the user's request can be generated.
 また、本実施の形態によれば、学習処理による解析結果、及び推論精度評価結果に基づき、モデル縮約処理を継続して行うか否かが判断される。この構成によれば、ユーザの要求に、より適した縮約モデルを生成することが可能となる。
 (実施の形態2)
Further, according to the present embodiment, it is determined whether or not the model contraction processing is continuously performed based on the analysis result by the learning process and the inference accuracy evaluation result. According to this configuration, it is possible to generate a contraction model that is more suitable for the user's request.
(Embodiment 2)
 次に、実施の形態2について説明する。なお、以下では前述の実施の形態と重複する箇所については、原則として説明を省略する。以下で説明する実施の形態2、3では、重み行列の要素を変更する第1縮約処理が、第1縮約パラメータ及び第2縮約パラメータを用いて行われる。 Next, the second embodiment will be described. In principle, the description of the parts that overlap with the above-described embodiment will be omitted below. In the second and third embodiments described below, the first contraction process for changing the elements of the weight matrix is performed using the first contraction parameter and the second contraction parameter.
 図10は、本発明の実施の形態2に係るニューラルネットワークのモデル縮約装置の構成の一例を示すブロック図である。図10のモデル縮約装置201は、学習・評価制御部10、第1縮約パラメータ受付部20、重み算出処理部230、第2縮約パラメータ受付部40、ネットワーク変形・再合成処理部50、重み行列記憶部60、学習処理部70、推論精度評価部80等を備えている。 FIG. 10 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the second embodiment of the present invention. The model contraction device 201 of FIG. 10 includes a learning / evaluation control unit 10, a first contraction parameter reception unit 20, a weight calculation processing unit 230, a second contraction parameter reception unit 40, and a network deformation / resynthesis processing unit 50. It includes a weight matrix storage unit 60, a learning processing unit 70, an inference accuracy evaluation unit 80, and the like.
 第2縮約パラメータ受付部40は、学習・評価制御部10から入力された第2縮約パラメータを、重み算出処理部230、ネットワーク変形・再合成処理部50へ出力する。重み算出処理部230は、第1縮約パラメータ及び第2縮約パラメータを用いて、重み行列に対する第1縮約処理を行う。 The second contraction parameter receiving unit 40 outputs the second contraction parameter input from the learning / evaluation control unit 10 to the weight calculation processing unit 230 and the network transformation / resynthesis processing unit 50. The weight calculation processing unit 230 performs the first contraction processing on the weight matrix by using the first contraction parameter and the second contraction parameter.
 図11は、本発明の実施の形態2に係るモデル縮約方法の概要を示すフロー図である。本実施の形態では、ステップS10において第1縮約パラメータが設定されると、ステップS30において第2縮約パラメータが設定される。ステップS220において、重み算出処理部230は、第1縮約パラメータ及び第2縮約パラメータを用いて、重み行列に対する第1縮約処理を行う。重み算出処理部230は、第1縮約処理により更新した重み行列を重み行列記憶部60へ格納する。ここで更新された重み行列は、第1縮約パラメータ及び第2縮約パラメータの両方に依存している。 FIG. 11 is a flow chart showing an outline of the model contraction method according to the second embodiment of the present invention. In the present embodiment, when the first contraction parameter is set in step S10, the second contraction parameter is set in step S30. In step S220, the weight calculation processing unit 230 performs the first contraction processing on the weight matrix by using the first contraction parameter and the second contraction parameter. The weight calculation processing unit 230 stores the weight matrix updated by the first contraction processing in the weight matrix storage unit 60. The weight matrix updated here depends on both the first contraction parameter and the second contraction parameter.
 ステップS240おいて、ネットワーク変形・再合成処理部50は、ステップS220で更新された重み行列に対し、行列サイズを縮小する第2縮約処理を行う。また、ネットワーク変形・再合成処理部50は、第2縮約処理により更新された重み行列に対応して、ネットワーク形状を変形する。変形されたネットワーク形状は、第1縮約パラメータ及び第2縮約パラメータの影響を受ける。ネットワーク変形・再合成処理部50は、更新した重み行列、及び変形したネットワーク形状をモデル記憶部100の各記憶部に格納する。
 <<モデル縮約方法の具体例(3)>>
In step S240, the network transformation / resynthesis processing unit 50 performs a second reduction process for reducing the matrix size of the weight matrix updated in step S220. In addition, the network transformation / resynthesis processing unit 50 transforms the network shape in response to the weight matrix updated by the second contraction processing. The deformed network shape is affected by the first contraction parameter and the second contraction parameter. The network transformation / resynthesis processing unit 50 stores the updated weight matrix and the transformed network shape in each storage unit of the model storage unit 100.
<< Specific example of model contraction method (3) >>
 次に、本実施の形態におけるモデル縮約方法の具体例について説明する。図12は、実施の形態2におけるモデル縮約方法の一例を示すフロー図である。図12は図11と対応しているステップもあるため、図11と対応するステップには同一の符号が付されている。本例では、第1縮約処理として低ランク近似、第2縮約処理としてニューロン刈りがそれぞれ割り当てられている。 Next, a specific example of the model contraction method in the present embodiment will be described. FIG. 12 is a flow chart showing an example of the model contraction method according to the second embodiment. Since there are steps corresponding to FIG. 11 in FIG. 12, the steps corresponding to FIG. 11 are designated by the same reference numerals. In this example, low-rank approximation is assigned as the first contraction process, and neuron mowing is assigned as the second contraction process.
 ステップS10では、行列ランクの閾値を規定する行列ランク閾値が第1縮約パラメータとして設定される。ステップS30では、差分閾値が第2縮約パラメータとして設定される。ここでいう差分とは、低ランク化前の元の重み行列の行列ランクと、低ランク化後の重み行列の行列ランクとの差分である。この差分は次の式(4)で定義される。学習・評価制御部10は、この差分(Rij)に対する差分閾値(δ)を第1縮約パラメータとして設定する。 In step S10, the matrix rank threshold that defines the matrix rank threshold is set as the first contraction parameter. In step S30, the difference threshold is set as the second reduction parameter. The difference referred to here is the difference between the matrix rank of the original weight matrix before the lower rank and the matrix rank of the weight matrix after the lower rank. This difference is defined by the following equation (4). Learning and evaluation control unit 10 sets the difference threshold ([delta]) relative to the difference (R ij) as the first contraction parameter.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 ステップS220において、重み算出処理部230は、行列ランク閾値及び差分閾値を用いて、重み行列に対する低ランク近似を行う。重み算出処理部230は、例えば、以下の式(5)に従い、重み行列の各要素の重みを変更する。 In step S220, the weight calculation processing unit 230 uses the matrix rank threshold value and the difference threshold value to perform low-rank approximation to the weight matrix. The weight calculation processing unit 230 changes the weight of each element of the weight matrix according to the following equation (5), for example.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 具体的に述べると、差分(Rij)が差分閾値(δ)より小さい要素については(δ>Rij)重みの変更を行わず、差分(Rij)が差分閾値(δ)より大きい要素については(δ<Rij)重みを0に変更する。このようにして、重み算出処理部230は、重み行列を更新する。すなわち、重み算出処理部230は、差分閾値と差分とを比較し、差分が差分閾値以上となる低ランク化の影響が大きい成分を0とし、それ以外の成分については、低ランク化行列における重みを要素として、重み行列を更新する。そして、重み算出処理部230は、実施の形態1と同様の方法により低ランク化行列を導出する。 Specifically, for elements whose difference (R ij ) is smaller than the difference threshold (δ), the weight is not changed (δ> R ij ), and for elements whose difference (R ij ) is larger than the difference threshold (δ). (Δ <R ij ) changes the weight to 0. In this way, the weight calculation processing unit 230 updates the weight matrix. That is, the weight calculation processing unit 230 compares the difference threshold value with the difference, sets 0 as a component having a large influence of lowering the difference when the difference is equal to or larger than the difference threshold value, and sets the other components to 0, and for other components, the weight in the lowering matrix. Update the weight matrix with. Then, the weight calculation processing unit 230 derives the lower rank matrix by the same method as in the first embodiment.
 そして、ステップS230において、ネットワーク変形・再合成処理部50は、ステップS220で導出された低ランク化行列に対するニューロン刈りの割合を算出する。例えば、ネットワーク変形・再合成処理部50は、低ランク化行列において0埋めされた成分の数をニューロン刈り割合として算出してもよい。 Then, in step S230, the network transformation / resynthesis processing unit 50 calculates the ratio of neuron mowing to the low-ranking matrix derived in step S220. For example, the network transformation / resynthesis processing unit 50 may calculate the number of 0-filled components in the lowering matrix as the neuron cutting ratio.
 ステップS240において、ネットワーク変形・再合成処理部50は、ステップS230で算出したニューロン刈り割合に応じたニューロン刈りを行い、重み行列のサイズを縮小する。また、ネットワーク変形・再合成処理部50は、サイズを縮小した重み行列に対して、ネットワーク形状を変形する。 In step S240, the network transformation / resynthesis processing unit 50 performs neuron cutting according to the neuron cutting ratio calculated in step S230, and reduces the size of the weight matrix. Further, the network transformation / resynthesis processing unit 50 transforms the network shape with respect to the weight matrix whose size has been reduced.
 なお、低ランク化行列において0埋めされた成分の数をニューロン刈り割合として算出した場合、1つの差分閾値に対して、各層における適切な縮約パラメータを自動的に設定することが可能となる。 When the number of 0-filled components in the low-ranking matrix is calculated as the neuron cutting ratio, it is possible to automatically set appropriate contraction parameters in each layer for one difference threshold.
 本実施の形態によれば、第1縮約パラメータ及び第2縮約パラメータを用いた第1縮約処理が行われる。この構成によれば、第1縮約パラメータ及び第2縮約パラメータの影響を受けて重み行列を更新することが可能となる。
 (実施の形態3)
According to the present embodiment, the first contraction process using the first contraction parameter and the second contraction parameter is performed. According to this configuration, the weight matrix can be updated under the influence of the first contraction parameter and the second contraction parameter.
(Embodiment 3)
 次に、実施の形態3について説明する。図13は、本発明の実施の形態3に係るニューラルネットワークのモデル縮約装置の構成の一例を示すブロック図である。図13のモデル縮約装置301は、図10と類似しているが、学習・評価制御部310と、第1縮約パラメータ受付部20及び第2縮約パラメータ受付部40との間に縮約パラメータ算出処理部315が設けられている点が異なる。 Next, the third embodiment will be described. FIG. 13 is a block diagram showing an example of the configuration of the model contraction device of the neural network according to the third embodiment of the present invention. The model contraction device 301 of FIG. 13 is similar to that of FIG. 10, but contracts between the learning / evaluation control unit 310 and the first contraction parameter reception unit 20 and the second contraction parameter reception unit 40. The difference is that the parameter calculation processing unit 315 is provided.
 学習・評価制御部310は、直前に実行された第1縮約処理による縮約量と、第2縮約処理による縮約量との縮約比重を縮約パラメータとして算出する。縮約パラメータ算出処理部315は、学習・評価制御部310において算出された縮約比重を用いて、第1縮約パラメータ及び第2縮約パラメータを算出する。縮約パラメータ算出処理部315は、算出した第1縮約パラメータ及び第2縮約パラメータを第1縮約パラメータ受付部20及び第2縮約パラメータ受付部40にそれぞれ出力する。 The learning / evaluation control unit 310 calculates the contraction specific gravity of the contraction amount by the first contraction process executed immediately before and the contraction amount by the second contraction process as a contraction parameter. The contraction parameter calculation processing unit 315 calculates the first contraction parameter and the second contraction parameter using the contraction specific gravity calculated by the learning / evaluation control unit 310. The contraction parameter calculation processing unit 315 outputs the calculated first contraction parameter and the second contraction parameter to the first contraction parameter reception unit 20 and the second contraction parameter reception unit 40, respectively.
 このように、本実施の形態では、縮約比重を用いて第1縮約パラメータ及び第2縮約パラメータが自動的に算出されるので、初期値を除いてユーザが第1縮約パラメータ及び第2縮約パラメータを直接入力することはない。 As described above, in the present embodiment, the first contraction parameter and the second contraction parameter are automatically calculated using the contraction specific gravity, so that the user can use the first contraction parameter and the second contraction parameter except for the initial value. 2 The reduction parameters are not entered directly.
 図14は、本発明の実施の形態3に係るモデル縮約方法の概要を示すフロー図である。図14は、図11と類似しており、対応するステップには同一の符号が付されている。ステップS310において、学習・評価制御部310は、縮約比重を算出し、縮約比重を縮約パラメータ算出処理部315へ出力する。 FIG. 14 is a flow chart showing an outline of the model contraction method according to the third embodiment of the present invention. FIG. 14 is similar to FIG. 11 and has the same reference numerals to the corresponding steps. In step S310, the learning / evaluation control unit 310 calculates the contraction specific gravity and outputs the contraction specific gravity to the contraction parameter calculation processing unit 315.
 ステップS320において、縮約パラメータ算出処理部315は、入力された縮約比重を用いて第1縮約パラメータ及び第2縮約パラメータを算出する。算出された第1縮約パラメータは、第1縮約パラメータ受付部20を介して重み算出処理部30へ出力される。算出された第2縮約パラメータは、第2縮約パラメータ受付部40を介して重み算出処理部30及びネットワーク変形・再合成処理部50へ出力される。 In step S320, the contraction parameter calculation processing unit 315 calculates the first contraction parameter and the second contraction parameter using the input contraction specific gravity. The calculated first contraction parameter is output to the weight calculation processing unit 30 via the first contraction parameter reception unit 20. The calculated second contraction parameter is output to the weight calculation processing unit 30 and the network transformation / resynthesis processing unit 50 via the second contraction parameter reception unit 40.
 ステップS360において、学習・評価制御部310は、モデル縮約処理を継続すると判断すると(Yes)、ステップS10に戻り、縮約パラメータ(縮約比重)を算出する。一方、学習・評価制御部310は、モデル縮約処理を継続しないと判断すると(No)、モデル縮約処理を終了する。 In step S360, when the learning / evaluation control unit 310 determines that the model contraction process is to be continued (Yes), it returns to step S10 and calculates the contraction parameter (reduction specific gravity). On the other hand, when the learning / evaluation control unit 310 determines that the model contraction process is not continued (No), the model contraction process ends.
 本実施の形態によれば、モデル縮約処理を継続する場合、更新対象のパラメータが縮約比重のみとなるため、ループ回数を削減することができ、モデル縮約に係る設計期間の増大をより抑えることが可能となる。 According to the present embodiment, when the model contraction process is continued, the parameter to be updated is only the contraction specific gravity, so that the number of loops can be reduced and the design period related to the model contraction can be increased. It becomes possible to suppress it.
 なお、本発明は上記した実施の形態に限定されるものではなく、様々な変形例が含まれる。また、ある実施の形態の構成の一部を他の実施の形態の構成に置き換えることが可能であり、また、ある実施の形態の構成に他の実施の形態の構成を加えることも可能である。また、各実施の形態の構成の一部について、他の構成の追加、削除、置換をすることが可能である。なお、図面に記載した各部材や相対的なサイズは、本発明を分かりやすく説明するため簡素化・理想化しており、実装上はより複雑な形状となる場合がある。 The present invention is not limited to the above-described embodiment, and includes various modifications. It is also possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. .. Further, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration. It should be noted that each member and the relative size described in the drawings are simplified and idealized in order to explain the present invention in an easy-to-understand manner, and may have a more complicated shape in mounting.
 1、201、301…モデル縮約装置、10、310…学習・評価制御部、30、230…重み算出処理部、50…ネットワーク変形・再合成処理部、70…学習処理部、80…推論精度評価部、315…縮約パラメータ算出処理部 1, 201, 301 ... Model contraction device, 10, 310 ... Learning / evaluation control unit, 30, 230 ... Weight calculation processing unit, 50 ... Network transformation / resynthesis processing unit, 70 ... Learning processing unit, 80 ... Inference accuracy Evaluation unit, 315 ... Reduction parameter calculation processing unit

Claims (9)

  1.  ニューラルネットワークのモデル縮約装置であって、
     前記ニューラルネットワークのモデルは、ネットワーク形状と前記ネットワーク形状に対応する重み行列とを含み、
     第1縮約パラメータを用いて、前記重み行列の要素を変更する第1縮約処理を行い、前記重み行列を更新する第1縮約処理部と、
     第2縮約パラメータを用いて、更新された前記重み行列のサイズを縮小する第2縮約処理を行い、縮小した前記重み行列に対応させて前記ネットワーク形状を変形する第2縮約処理部と、
     を備えている、
     ニューラルネットワークのモデル縮約装置。
    A model contraction device for neural networks
    The model of the neural network includes a network shape and a weight matrix corresponding to the network shape.
    Using the first contraction parameter, the first contraction process for changing the elements of the weight matrix is performed, and the first contraction process unit for updating the weight matrix is used.
    Using the second contraction parameter, the second contraction process for reducing the size of the updated weight matrix is performed, and the second contraction processing unit that deforms the network shape corresponding to the reduced weight matrix is used. ,
    Is equipped with
    Neural network model contraction device.
  2.  請求項1に記載のモデル縮約装置において、
     前記第1縮約パラメータ及び前記第2縮約パラメータを設定する制御部を備えている、
     モデル縮約装置。
    In the model contraction device according to claim 1,
    A control unit for setting the first contraction parameter and the second contraction parameter is provided.
    Model contraction device.
  3.  請求項2に記載のモデル縮約装置において、
     学習データを基に、サイズが縮小された前記重み行列を用いた解析処理を実行させ、解析結果と学習データとを比較し、サイズが縮小された前記重み行列の各要素を最適化する学習処理を行う学習処理部を備えている、
     モデル縮約装置。
    In the model contraction device according to claim 2.
    Based on the training data, an analysis process using the reduced size weight matrix is executed, the analysis result is compared with the training data, and each element of the reduced size weight matrix is optimized. It has a learning processing unit that performs
    Model contraction device.
  4.  請求項3に記載のモデル縮約装置において、
     前記制御部は、前記解析結果に基づき、モデル縮約を継続するか否かを判断する、
     モデル縮約装置。
    In the model contraction device according to claim 3,
    The control unit determines whether or not to continue model contraction based on the analysis result.
    Model contraction device.
  5.  請求項3に記載のモデル縮約装置において、
     サイズが縮小された前記重み行列に対し、推論精度評価用データを用いて推論精度評価を行う推論精度評価部を備えている、
     モデル縮約装置。
    In the model contraction device according to claim 3,
    It is provided with an inference accuracy evaluation unit that evaluates inference accuracy using inference accuracy evaluation data for the weight matrix whose size has been reduced.
    Model contraction device.
  6.  請求項5に記載のモデル縮約装置において、
     前記制御部は、前記解析結果及び前記推論精度評価部による推論精度評価結果に基づき、モデル縮約を継続するか否かを判断する、
     モデル縮約装置。
    In the model contraction device according to claim 5.
    The control unit determines whether or not to continue model contraction based on the analysis result and the inference accuracy evaluation result by the inference accuracy evaluation unit.
    Model contraction device.
  7.  請求項1に記載のモデル縮約装置において、
     前記第1縮約処理、前記第2縮約処理は、低ランク近似、ニューロン刈り、シナプス刈りのいずれかである、
     モデル縮約装置。
    In the model contraction device according to claim 1,
    The first contraction process and the second contraction process are any of low-rank approximation, neuron mowing, and synaptic mowing.
    Model contraction device.
  8.  請求項1に記載のモデル縮約装置において、
     前記第1縮約処理部は、前記第1縮約パラメータ及び前記第2縮約パラメータを用いて前記第1縮約処理を行う、
     モデル縮約装置。
    In the model contraction device according to claim 1,
    The first contraction processing unit performs the first contraction processing using the first contraction parameter and the second contraction parameter.
    Model contraction device.
  9.  請求項1に記載のモデル縮約装置において、
     直前に実行された前記第1縮約処理による縮約量と、前記第2縮約処理による縮約量との縮約比重を縮約パラメータとして算出する制御部と、
     前記縮約比重を用いて前記第1縮約パラメータ及び前記第2縮約パラメータを算出する縮約パラメータ算出処理部と、
     を備えている、
     モデル縮約装置。
    In the model contraction device according to claim 1,
    A control unit that calculates the contraction specific gravity of the contraction amount by the first contraction process executed immediately before and the contraction amount by the second contraction process as a contraction parameter.
    A contraction parameter calculation processing unit that calculates the first contraction parameter and the second contraction parameter using the contraction specific gravity, and
    Is equipped with
    Model contraction device.
PCT/JP2020/011067 2019-03-22 2020-03-13 Model reduction device of neural network WO2020195940A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-055025 2019-03-22
JP2019055025A JP7150651B2 (en) 2019-03-22 2019-03-22 Neural network model reducer

Publications (1)

Publication Number Publication Date
WO2020195940A1 true WO2020195940A1 (en) 2020-10-01

Family

ID=72559403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/011067 WO2020195940A1 (en) 2019-03-22 2020-03-13 Model reduction device of neural network

Country Status (2)

Country Link
JP (1) JP7150651B2 (en)
WO (1) WO2020195940A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022136575A (en) * 2021-03-08 2022-09-21 オムロン株式会社 Inference device, model generation device, inference method, and inference program
JP2023083997A (en) 2021-12-06 2023-06-16 株式会社デンソー Model generation method, model generation program, model generation device, and data processing device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180232640A1 (en) * 2017-02-10 2018-08-16 Samsung Electronics Co., Ltd. Automatic thresholds for neural network pruning and retraining
JP2018129033A (en) * 2016-12-21 2018-08-16 アクシス アーベー Artificial neural network class-based pruning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018129033A (en) * 2016-12-21 2018-08-16 アクシス アーベー Artificial neural network class-based pruning
US20180232640A1 (en) * 2017-02-10 2018-08-16 Samsung Electronics Co., Ltd. Automatic thresholds for neural network pruning and retraining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SASAKI, KENTA ET AL.: "Non-official translation: Adaptive compression of deep learning models by parameter reduction and reconstruction", DEIM FORUM 2018 FL-2, THE 10TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT (THE 16TH ANNUAL CONFERENCE OF THE DATABASE SOCIETY OF JAPAN), 6 March 2018 (2018-03-06) *
WATANABE, CHIHIRO ET AL.: "Non-official translation: Extracting rough structure based on combined weights of positive and negative in multilayered neural networks", FORUM ON INFORMATION TECHNOLOGY 2017(FIT2017, 5 September 2017 (2017-09-05) *

Also Published As

Publication number Publication date
JP2020155010A (en) 2020-09-24
JP7150651B2 (en) 2022-10-11

Similar Documents

Publication Publication Date Title
US11875268B2 (en) Object recognition with reduced neural network weight precision
US11568258B2 (en) Operation method
KR101880901B1 (en) Method and apparatus for machine learning
KR102410820B1 (en) Method and apparatus for recognizing based on neural network and for training the neural network
US20190278600A1 (en) Tiled compressed sparse matrix format
US20170004399A1 (en) Learning method and apparatus, and recording medium
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
US20190050734A1 (en) Compression method of deep neural networks
WO2019091020A1 (en) Weight data storage method, and neural network processor based on method
US9129222B2 (en) Method and apparatus for a local competitive learning rule that leads to sparse connectivity
US11392829B1 (en) Managing data sparsity for neural networks
CN107292352B (en) Image classification method and device based on convolutional neural network
CN111105029B (en) Neural network generation method, generation device and electronic equipment
WO2022105108A1 (en) Network data classification method, apparatus, and device, and readable storage medium
JP6950756B2 (en) Neural network rank optimizer and optimization method
US11657285B2 (en) Methods, systems, and media for random semi-structured row-wise pruning in neural networks
US11775832B2 (en) Device and method for artificial neural network operation
CN110659725A (en) Neural network model compression and acceleration method, data processing method and device
CN109447096B (en) Glance path prediction method and device based on machine learning
WO2020195940A1 (en) Model reduction device of neural network
US20190311248A1 (en) Method for random sampled convolutions with low cost enhanced expressive power
CN116805157B (en) Unmanned cluster autonomous dynamic evaluation method and device
KR20220032861A (en) Neural architecture search method and attaratus considering performance in hardware
WO2022127603A1 (en) Model processing method and related device
US20220121927A1 (en) Providing neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20779296

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20779296

Country of ref document: EP

Kind code of ref document: A1