WO2020008643A1 - Data processing device, data processing circuit, and data processing method - Google Patents

Data processing device, data processing circuit, and data processing method Download PDF

Info

Publication number
WO2020008643A1
WO2020008643A1 PCT/JP2018/025773 JP2018025773W WO2020008643A1 WO 2020008643 A1 WO2020008643 A1 WO 2020008643A1 JP 2018025773 W JP2018025773 W JP 2018025773W WO 2020008643 A1 WO2020008643 A1 WO 2020008643A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
precision
accuracy
communication path
processing
Prior art date
Application number
PCT/JP2018/025773
Other languages
French (fr)
Japanese (ja)
Inventor
芙美代 鷹野
誠也 柴田
竹中 崇
浩明 井上
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2020528664A priority Critical patent/JP7120308B2/en
Priority to PCT/JP2018/025773 priority patent/WO2020008643A1/en
Publication of WO2020008643A1 publication Critical patent/WO2020008643A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a data processing device, a data processing circuit, and a data processing method for performing data processing including calculations with two types of calculation accuracy.
  • learning data for example, parameters of arithmetic expressions and discriminants used in a predetermined learning device are adjusted based on the relationship between input and output indicated by the learning data.
  • the learning device is, for example, a discrimination model that performs discrimination on one or a plurality of labels when data is input.
  • Non-Patent Document 1 discloses an example of a learning calculation circuit and a learning method for efficiently executing deep learning of a neural network, particularly with low power consumption. Has been described.
  • Non-Patent Document 2 in deep learning in CNN (Convolutional Neural Network), a learning range is divided into a plurality of convolutional layers into a layer in which weights are fixed and a layer in which weights are updated (extended function layer). An example of a learning method for shortening the learning time by limiting is described.
  • Non-Patent Document 3 describes an optimization example of accelerator design based on FPGA (Field-Programmable Gate Array) as an example of a circuit configuration for learning operation in machine learning.
  • FPGA Field-Programmable Gate Array
  • an edge device layer a mechanism that allows learning not in a cloud environment but in a device at the site.
  • a learning method that can obtain a sufficient recognition rate with less computer resources and thus lower power consumption is desired.
  • Non-Patent Document 1 a 16-bit fixed-point arithmetic circuit is used as compared with TK1 (Jetson @ Kit) of NVIDIA which performs learning using a 32-bit floating-point arithmetic circuit. It is said that learning can be realized with low power consumption.
  • this method is intended to reduce the power consumption by reducing the bit width in the arithmetic circuit that performs all the learning operations (all the operations for adjusting the parameters) in exchange for a decrease in the operation accuracy.
  • no consideration is given to the adverse effects caused by a reduction in the calculation accuracy of the calculation circuit itself. For example, no consideration is given to the possibility that sufficient calculation accuracy for performing the learning calculation is not ensured.
  • a multi-layer operation using a configuration in which a plurality of units are connected in a layered manner is performed.
  • the multi-layer operation is performed by calculating a unit output for each layer (so-called inference).
  • processing for example, forward propagation processing
  • parameter updating processing for example, back propagation processing
  • the parameter update processing corresponds to an actual learning operation part in machine learning. Therefore, the calculation accuracy of the parameter update process is a calculation that greatly affects the recognition rate during operation, and the higher the accuracy, the better.
  • the calculation accuracy of the inference processing does not need to be very high in many cases.
  • an apparatus that performs a process in which an operation that requires high precision and an operation that does not require high accuracy are mixed has an operation circuit of two types of operation accuracy, and the execution destination of each operation performed in the process is It is assumed that the circuit to be executed is executed while being switched according to the accuracy required for the operation.
  • data exchange between cores (arithmetic circuits) having different arithmetic accuracies is an essential requirement in the device.
  • Non-Patent Document 2 merely aims to reduce the learning time by limiting the learning range, and to improve the efficiency of learning processing including data exchange between cores having different calculation accuracy. In particular, no consideration is given to efficient data exchange between cores having different accuracy. Also, the method described in Non-Patent Document 3 is merely to reduce the circuit scale and the calculation time by optimizing the circuit configuration of the circuit that performs all the learning operations, and also between the cores having different calculation accuracy. No consideration is given to improving the efficiency of learning processing including data exchange, particularly the efficiency of data exchange between cores having different accuracies.
  • the present invention has been made in view of the above-described problems, and provides a data processing device, a data processing circuit, and a data processing method that can further increase the efficiency of a process in which an operation that requires high accuracy and an operation that does not require high accuracy are mixed.
  • the purpose is to do.
  • a data processing device includes a low-precision arithmetic processing unit that performs a predetermined operation with a first accuracy, a high-precision arithmetic processing unit that performs a predetermined operation with a second accuracy higher than the first accuracy, A first data converter provided at an end of the communication path on the side of the high-precision arithmetic processing means for transferring data between the precision arithmetic processing means and the low-precision arithmetic processing means; The means is provided when the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path uses first-precision data.
  • a predetermined conversion is performed on the data passing between the communication path and the high-precision arithmetic processing means so that the data amount of the communication path becomes equal to or less than the data amount and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.
  • a data processing circuit includes a low-precision arithmetic circuit that performs a predetermined operation at a first accuracy, a high-precision arithmetic circuit that performs a predetermined operation at a second accuracy higher than the first accuracy, and a high-precision operation.
  • a low-precision arithmetic circuit that performs a predetermined operation at a first accuracy
  • a high-precision arithmetic circuit that performs a predetermined operation at a second accuracy higher than the first accuracy
  • a high-precision operation Provided at the end of the communication path for transmitting data between the circuit and the low-precision arithmetic circuit on the high-precision arithmetic circuit side, and predetermined for data passing between the communication path and the high-precision arithmetic circuit.
  • a first data conversion circuit for performing the converted data, wherein data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit. Is less than or equal to the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.
  • the data processing method is characterized in that a low-precision calculation processing means for performing a predetermined calculation with a first precision and a high-precision calculation processing means for performing a predetermined calculation with a second precision higher than the first precision.
  • the first data conversion means provided at the end of the communication path on the side of the high-precision arithmetic processing means for transferring data with the high-precision arithmetic processing means is capable of performing high-precision arithmetic
  • the amount of data passing through the communication path is equal to or less than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication path is equal to or less than the first accuracy.
  • a predetermined conversion is performed on data passing between the communication path and the high-precision arithmetic processing means.
  • the first data conversion means may be configured so that the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the data passing through the communication path.
  • the distance between the communication path and the high-precision arithmetic processing means is set so that the amount is smaller than the data amount when the first precision data is used, and the precision of the data passing through the communication path is lower than the first precision.
  • the second data converter provided at the end of the communication path on the side of the low-precision processing means performs predetermined conversion on the data passing therethrough.
  • the data that can be handled by the low-precision arithmetic processing means the amount of data passing through the communication path is smaller than the amount of data when the data of the first precision is used, and the precision of data passing through the communication path is lower than the first precision. Lower As described above, it may be configured to perform a predetermined conversion on the data passing between the channel and the low-precision arithmetic processing means.
  • FIG. 3 is an explanatory diagram illustrating an outline of a learning method as an example of the data processing method of the present invention. It is an explanatory view showing an example of input and output of a certain unit, and combination with another unit. It is a block diagram showing an example of composition of a learning device of a 1st embodiment.
  • FIG. 2 is a configuration diagram illustrating an example of a hardware configuration of a learning processing unit 106.
  • FIG. 4 is an explanatory diagram showing an example of a combination of the calculation accuracy in the low precision calculation circuit 11 and the calculation precision in the high precision calculation circuit 12.
  • FIG. 2 is a schematic block diagram illustrating a configuration example of a computer according to the learning device 100.
  • FIG. 3 is a schematic configuration diagram illustrating an example of an arithmetic circuit.
  • FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit.
  • FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit.
  • FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit.
  • 4 is a flowchart illustrating an example of an operation of the learning device 100 according to the first embodiment.
  • 6 is a flowchart illustrating a more specific operation example of the learning device 100.
  • 9 is a flowchart illustrating another example of a more specific operation of the learning device 100.
  • 9 is a flowchart illustrating another example of a more specific operation of the learning device 100.
  • FIG. 9 is an explanatory diagram illustrating a configuration example of a data processing device according to a second embodiment.
  • FIG. 9 is an explanatory diagram illustrating a configuration example of a data processing device according to a second embodiment. It is a block diagram showing the outline of the data processor of the present invention.
  • FIG. 2 is a configuration diagram illustrating a configuration of a data processing circuit of the present invention.
  • FIG. 9 is a configuration diagram illustrating another configuration of the data processing circuit of the present invention.
  • FIG. 1A is an explanatory diagram showing an example of a general learning method in a neural network including one or more intermediate layers between an input layer and an output layer, and a circuit configuration therefor.
  • 3) is an explanatory diagram showing an example of a learning method as an example of the data processing method of the present invention and an example of a circuit configuration therefor.
  • a large-scale learning circuit 90 is used to learn the entire neural network, which is a predetermined discriminant model, in order to support a learning algorithm for general use.
  • balloons attached to the circuits schematically show directions and ranges of processing in the learning process of the neural network.
  • reference numeral 51 (circle in the figure) represents a unit corresponding to a neuron in the neural network.
  • Reference numeral 52 (a line connecting the units in the drawing) represents an inter-unit connection.
  • Reference numeral 53 (the right-handed bold arrow in the figure) indicates the inference processing and its range.
  • Reference numeral 54 (a thick arrow pointing left in the figure) indicates a parameter update process and its range.
  • FIG. 1 shows an example of a feedforward type neural network in which an input to each unit is an output of a unit in a preceding layer, an input to each unit is not limited to this.
  • the input to each unit can include the output of the unit of the preceding layer at the previous time, as in a recurrent neural network.
  • the direction of the inference processing is considered to be the direction (forward direction) from the input layer to the output layer.
  • Such inference processing performed in a predetermined order from the input layer is also called “forward propagation”.
  • the direction of the parameter update processing is not particularly limited. The direction may be a direction from the output layer to the input layer (reverse direction) as in the parameter update processing in the figure.
  • the direction of the parameter update processing in the figure is an example of the error back propagation method
  • the parameter update processing is not limited to the error back propagation method.
  • the parameter update processing may be STDP (Spike ⁇ Timing ⁇ Dependent ⁇ Plasticity).
  • examples of the method of learning a model in deep learning include the following learning methods. First, after inputting learning data to the input layer, an inference process of calculating the output of each unit in the forward direction in each layer up to the output layer is performed (forward propagation: see arrow 53 in the figure). Next, based on an error calculated from the output from the output layer (final output) and the relationship between the input and the output indicated by the learning data, the output layer to the first layer are designed to minimize the error. Tracing each layer in the reverse direction to perform a parameter update process of updating a parameter for calculating an output of each unit in the layer (back propagation: see arrow 54 in the figure).
  • FIG. 1A shows a large-scale learning circuit 90 that performs the above-described inference processing and parameter updating processing with high calculation accuracy as an example of realizing an arithmetic circuit that performs such learning.
  • the higher the calculation accuracy of the inference process and the parameter update process, and the wider the calculation range of the process the larger the number of expansion terms of the error function and the size of the circuit, resulting in a large increase in power consumption.
  • the learning here refers to a parameter updating process, which is a more actual learning process, as described above.
  • the process is performed in the same manner as described above up to forward propagation.
  • a designated unit for example, the nth layer which is the output layer
  • a parameter updating process for updating the parameter for calculating the output of the unit (for example, the weight for coupling with another unit) is performed.
  • FIG. 1B as an implementation example of the arithmetic circuit 10 that performs such learning, a high-precision arithmetic circuit 12 that performs parameter update processing of some units specified with high arithmetic accuracy, and a high-precision arithmetic circuit 12
  • a low-precision operation circuit 11 that performs inference processing of at least a specified unit with lower operation accuracy is combined.
  • the high-precision operation circuit 12 is caused to perform parameter update processing for some units requiring high-precision operation,
  • the arithmetic circuit 11 performs other processing that does not require high-precision arithmetic.
  • At least a part of the inference processing is performed with a low calculation accuracy, and at least a part of the parameter update processing is performed with a high calculation accuracy.
  • FIG. 1B shows an example in which some layers on the output side are set as a range for updating parameters (actual learning range), the range for updating parameters is not limited to the layers on the output side. It is also possible to individually specify an odd layer, an even layer, or the like among the first to n-th layers.
  • FIG. 1B shows an example in which the range of the parameter update process itself is limited. However, the range of the parameter update process itself is not limited, and the range of the parameter update process performed with high calculation accuracy is limited. Is also good. That is, it is possible to perform the parameter update processing with high calculation accuracy only for some of the units, and perform the parameter update processing with low calculation accuracy for the other units. It should be noted that the parameter update processing can be divided into three types: a unit performed by a high-precision calculation, a unit performed by a low-precision calculation, and a unit not performed (the parameters are fixed at that time). It is.
  • the inference processing of all the units is performed by the low-precision operation
  • the parameter update processing of all the units is performed by the high-precision operation. It is also possible to do. Further, for example, it is also possible to perform the inference processing of all the units by low-precision calculation, and to perform the parameter update processing of some units by high-precision calculation. In that case, the parameter update processing may be performed by low-precision calculation or may be excluded from the parameter update processing for some of the remaining units excluded from the high-precision calculation. Further, for example, it is also possible to perform inference processing and parameter update processing by low-precision calculation for some units, and to perform inference processing and parameter update processing by high-precision calculation for the remaining units.
  • the learning device includes a low-precision operation circuit having a relatively low operation accuracy and a high-precision operation circuit having a relatively high operation accuracy.
  • Any configuration may be used as long as it causes the low-precision arithmetic circuit to perform inference processing for at least some of the units and the high-precision arithmetic circuit to perform parameter update processing for at least some of the units.
  • the inference processing of some of the remaining units may be performed by a low-precision arithmetic circuit or a high-precision arithmetic circuit.
  • the parameter update processing of the remaining part of the units may be performed by a low-precision arithmetic circuit, or the processing itself may be omitted. Which units are subject to high-precision inference processing or low-precision inference processing, and which units are subject to high-precision parameter update processing or low-precision parameter update processing Alternatively, there is no particular limitation on whether or not the processing is to be performed.
  • FIG. 2 is an explanatory diagram showing an example of input / output of the unit and connection with another unit when focusing on one unit.
  • FIG. 2A shows an example of input and output of one unit
  • FIG. 2B shows an example of coupling between units arranged in two layers.
  • the operation of the unit is, for example, the equation (1A) Is represented as
  • f () represents an activation function.
  • a represents an intercept
  • w 1 to w 4 represent parameters such as weights corresponding to each input (x 1 to x 4 ).
  • the section a is omitted.
  • the intercept a can be regarded as a coefficient (one of parameters) of a constant term having a value of 1.
  • k represents an input to each unit in the layer, more specifically, an identifier of another unit performing the input.
  • u i (L) ⁇ w i, k (L) * z k ( It is also possible to write L-1) .
  • L represents a layer identifier.
  • w i, k is a parameter of each unit i in the layer (the L-th layer), more specifically, a weight of a bond (inter-unit bond) between each unit i and another unit k.
  • the calculation for obtaining the output z from the input x for a certain unit corresponds to the inference processing in the unit.
  • the parameter w is fixed.
  • the calculation for obtaining the parameter w for a certain unit corresponds to a parameter updating process in the unit.
  • FIG. 3 is a block diagram illustrating a configuration example of the learning device according to the first embodiment.
  • the learning device 100 illustrated in FIG. 3 includes a pre-learning model storage unit 101, a learning data storage unit 102, a learning processing unit 106, and a post-learning model storage unit 107.
  • the pre-learning model storage unit 101 stores information on the model before learning.
  • the information of the model before learning may include an initial value of the parameter.
  • the learning data storage unit 102 stores learning data that is data used for learning a model.
  • the format of the learning data is not particularly limited.
  • the learning processing unit 106 performs learning of the model stored in the pre-learning model storage unit 101 using the learning data stored in the learning data storage unit 102.
  • the learning processing unit 106 of the present embodiment includes at least the high-efficiency inference processing unit 103a, the high-precision parameter update processing unit 104b, and the control unit 105.
  • the learning processing unit 106 may further include a high-precision inference processing unit 103b and a high-efficiency parameter update processing unit 104a, as shown in FIG.
  • the high-efficiency inference processing unit 103a performs inference processing for a specified layer or unit with a first calculation accuracy.
  • the high-precision parameter update processing unit 104b performs a parameter update process for a specified layer, unit, or parameter with a second operation accuracy higher than the first operation accuracy.
  • the control unit 105 controls each processing unit (in this example, the high-efficiency inference processing unit 103a, the high-accuracy inference processing unit 103b, the high-efficiency parameter update processing unit 104a, and the high-precision parameter update processing unit 104b) that performs the learning process. Then, necessary learning processing is performed. More specifically, the control unit 105 reads the model and the learning data before learning, and controls the switching of the calculation accuracy for the learning process by giving a calculation instruction to each processing unit that performs the learning process.
  • the calculation instruction includes designation of a unit to be calculated and input of parameters necessary for the calculation.
  • the post-learning model storage unit 107 stores information on the model after learning.
  • the information on the model after learning may include the updated parameter values of each unit.
  • FIG. 4 is a configuration diagram showing an example of a hardware configuration of the learning processing unit 106.
  • the learning processing unit 106 is configured by an arithmetic processing device or the like in which the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12, the memory 13, and the control device 14 are connected via the bus 15. It may be realized.
  • the high-precision operation circuit 12 may be any circuit that can perform an operation with higher operation accuracy than the low-accuracy operation circuit 11.
  • the high-efficiency inference processing unit 103a and the high-efficiency parameter update processing unit 104a may be realized by, for example, the low-precision arithmetic circuit 11.
  • the high-precision inference processing unit 103b and the high-precision parameter update processing unit 104b may be realized by, for example, the high-precision arithmetic circuit 12.
  • the control unit 105 may be realized by, for example, the control device 14.
  • the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 are connected via a bus 15, respectively, and can exchange data such as notifying each other of the arithmetic results via the bus 15.
  • a memory 13 may be further connected to the bus 15.
  • the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 can also exchange data via the memory 13.
  • the memory 13 is treated as a part of the communication path.
  • the memory 13 may be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as On-chip @ memory.
  • the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12, and the memory 13 may be internally connected in the chip.
  • the memory 13 does not have to be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as Off-chip @ memory. That is, it may be externally connected via an external memory interface.
  • the processing unit that performs the learning process measures the width and fineness of the range of the numerical data actually used for the calculation (more specifically, the processing unit).
  • a measure of the breadth and fineness of the range of numeric data determined by the handling of the bit width and the decimal point, etc.) in the arithmetic circuit that implements the above is referred to as “precision” or “operation accuracy”.
  • the low calculation accuracy which is the calculation accuracy in the low-precision calculation circuit 11
  • the high calculation accuracy which is the calculation accuracy in the high-precision calculation circuit 12
  • FIG. 5 is an explanatory diagram showing an example of a combination of low operation accuracy, which is the operation accuracy of the low accuracy operation circuit 11, and high operation accuracy, which is the operation accuracy of the high accuracy operation circuit 12.
  • the combination of the calculation accuracy in the low-precision calculation circuit 11 and the calculation accuracy in the high-precision calculation circuit 12 is not limited to that shown in FIG.
  • the calculation accuracy (low calculation accuracy) in the low-precision calculation circuit 11 is defined as any one of ⁇ 1, 2, 8, 16 ⁇ bits of a fixed decimal point or any of ⁇ 1, 2, 8, 16 ⁇ bits of an integer.
  • the calculation accuracy (high calculation accuracy) in the high-precision calculation circuit 12 is either fixed-point ⁇ 2,8,16,32 ⁇ bits, floating-point ⁇ 9,16,32 ⁇ bits or power ⁇ of ⁇ 2. Any of ⁇ 8, 16, 24, 32 ⁇ bits of floating point may be used.
  • the high calculation accuracy is higher than the low calculation accuracy (for example, the range of numerical data is wider, the range of numerical data is finer, and the number of significant digits that can be expressed is larger). I do.
  • FIG. 6 is a schematic block diagram illustrating a configuration example of a computer according to the learning device 100.
  • the computer 1000 includes a processor 1008, a main storage device 1002, an auxiliary storage device 1003, an interface 1004, a display device 1005, and an input device 1006. Further, the processor 1008 may include various arithmetic and processing devices such as the CPU 1001 and the GPU 1007.
  • the learning device 100 may be implemented in, for example, a computer 1000 as shown in FIG.
  • the operation of the learning device 100 (in particular, the control unit 105) may be stored in the auxiliary storage device 1003 in the form of a program.
  • the CPU 1001 reads out a program from the auxiliary storage device 1003, expands the program in the main storage device 1002, and performs a predetermined process in the learning device 100 according to the program.
  • the CPU 1001 is an example of an information processing device that operates according to a program
  • the computer 1000 is not limited to a CPU (Central Processing Unit), but may be an MPU (Micro Processing Unit), an MCU (Memory Control Unit), or a GPU (Graphics). Processing Unit).
  • FIG. 6 shows an example in which the computer 1000 further includes a GPU 1007 in which the above low-precision arithmetic circuit 11 and high-precision arithmetic circuit 12 are mounted in addition to the CPU 1001.
  • the circuit 12 is implemented by another processor or an arithmetic unit (such as a MAC (multiplier-accumulator), a multiplier tree, or an ALU (Arthmetic Logic Unit) array, which will be described later) is not limited to this example. What is necessary is just to have an arithmetic unit.
  • the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 may be mounted on different chips, and a specific chip configuration is not particularly limited.
  • the auxiliary storage device 1003 is an example of a non-transitory tangible medium.
  • Other examples of non-transitory tangible media include a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, and a semiconductor memory connected via the interface 1004.
  • the computer 1000 When the program is distributed to the computer 1000 via a communication line, the computer that has received the distribution may load the program into the main storage device 1002 and execute a predetermined process in the learning device 100.
  • the program may be for realizing a part of a predetermined process in the learning device 100. Further, the program may be a difference program that realizes a predetermined process in the learning device 100 in combination with another program already stored in the auxiliary storage device 1003.
  • the interface 1004 transmits and receives information to and from another device.
  • the display device 1005 presents information to the user. Further, the input device 1006 receives input of information from a user.
  • some elements of the computer 1000 can be omitted depending on the processing content of the learning device 100. For example, if the computer 1000 does not present information to the user, the display device 1005 can be omitted. For example, if the computer 1000 does not accept information input from a user, the input device 1006 can be omitted.
  • Part or all of the above components are implemented by a general-purpose or dedicated circuit (Circuitry), a processor, or a combination thereof. These may be constituted by a single chip, or may be constituted by a plurality of chips connected via a bus. In addition, some or all of the above-described components may be realized by a combination of the above-described circuit and the like and a program.
  • the plurality of information processing devices or circuits may be centrally arranged or distributed.
  • the information processing device, the circuit, and the like may be implemented as a form in which each is connected via a communication network, such as a client and server system or a cloud computing system.
  • the high-efficiency inference processing unit 103a performs, for example, for each unit in the specified layer or the specified unit, when receiving an input to the unit, performs an inference process of calculating the output of the unit with a predetermined low calculation accuracy. , A calculation result may be output.
  • the high-efficiency inference processing unit 103a may receive the values of the inputs and the values of other variables (parameters such as weights and intercepts) used for calculating the output of the unit as inputs and perform the above processing. Good.
  • the operation performed in the inference processing may be referred to as an inference operation.
  • a circuit for performing an inference operation is referred to as an “inference circuit”, and in particular, a circuit for performing an inference operation with lower operation accuracy than the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b is referred to as “inference circuit”.
  • High efficiency inference circuit In this manner, the operation accuracy of the inference circuit is made as low as possible, and at least lower than the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b (for example, the bit width is changed from 32 bits to 16 bits, floating point The operation is a fixed-point operation, for example) to reduce power consumption.
  • a circuit for performing an inference operation with the same operation accuracy as the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b may be referred to as a “high-accuracy inference circuit”.
  • the above-described high-precision inference processing unit (not shown) may be realized by such a high-precision inference circuit.
  • the configuration of the inference circuit described below can be realized regardless of whether the inference operation is performed with high accuracy or with low accuracy. That is, the difference between the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b may be only the accuracy of each variable, an adder, and a multiplier used for the operation in the arithmetic circuit in which the operation of the processing unit is implemented.
  • the simplest example of the inference circuit has a configuration in which one multiplier-adder (MAC) 221 in which a multiplier and an adder are combined is provided (see the arithmetic circuit 22a in FIG. 7A).
  • Reference numeral 21 represents a bus.
  • the MAC 221 may include a multiplier, an adder, a storage element holding three inputs, and a storage element holding one output (see FIG. 7B).
  • z corresponds to the output of the unit
  • a and w correspond to parameters (fixed in the inference processing)
  • x corresponds to the input of the unit.
  • the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
  • the arithmetic by the variables (a, w, x, z), the adder, and the multiplier in the MAC 221 included in the arithmetic circuit 22a has low arithmetic accuracy (first arithmetic operation).
  • FIGS. 8 to 10 are schematic configuration diagrams showing another example of an operation circuit (inference circuit) for inference operation.
  • the inference circuit may have a configuration in which a plurality of MACs 221 are connected in parallel (a configuration of a GPU), for example, as in an arithmetic circuit 22b illustrated in FIG. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
  • the inference circuit may have a configuration in which a plurality of multiply-addition trees 223 are connected in parallel via a memory layer 222, for example, as in an arithmetic circuit 22c shown in FIG.
  • the multiply-add tree 223 shown in FIG. 9 is a circuit having a configuration in which four multipliers, two adders, and one adder are connected in a tree shape. Note that an example of the arithmetic circuit 22c shown in FIG. 9 is also disclosed in Non-Patent Document 3. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
  • the inference circuit may have a configuration in which a plurality of ALUs 224 are connected in an array via the memory layer 222 (systolic array configuration), for example, as in an arithmetic circuit 22d shown in FIG.
  • An example of the arithmetic circuit 22d shown in FIG. 10 is also disclosed in Non-Patent Document 1. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
  • the high-efficiency inference processing unit 103a is realized by the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d shown in FIGS. It is only necessary that the calculation by the calculator corresponds to the low calculation accuracy (first calculation accuracy).
  • the high-precision inference processing unit 103b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, the calculation by each variable, the adder, or the multiplier used for the arithmetic in the circuit is performed. It is only necessary to correspond to high calculation accuracy (second calculation accuracy).
  • the high-precision parameter update processing unit 104b for example, for each parameter in each unit of the specified layer, for each parameter in the specified unit or the specified parameter, sets an objective function such as an error function that includes the parameter as an adjustment parameter.
  • a parameter updating process for solving the optimization problem and updating the adjustment parameter may be performed with a predetermined high calculation accuracy, and the updated value may be output.
  • the high-precision parameter update processing unit 104b may receive the value of the variable (which may include the value of the parameter before updating) used in solving the optimization problem as a parameter, and perform the above processing.
  • the operation performed in the parameter update processing may be referred to as a parameter update operation.
  • a circuit for performing the parameter update operation is referred to as a “parameter update circuit”, and in particular, a circuit for performing the thought learning operation with higher operation accuracy than the operation accuracy of the inference operation performed by the high-efficiency inference processing unit 103a.
  • This is called a “high-precision parameter update circuit”.
  • a circuit for performing a parameter updating operation with the same operation accuracy as the inference operation performed by the high-efficiency inference processing unit 103a is referred to as a “high-efficiency parameter updating circuit”.
  • the above-described high-efficiency parameter update processing unit may be realized by such a high-efficiency parameter update circuit.
  • the configuration of the parameter updating circuit described below can be realized irrespective of whether the parameter updating operation is performed with high accuracy or with low accuracy. In other words, even if the difference between the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b is only the accuracy of each variable, adder, or multiplier used in the operation in the arithmetic circuit that implements the operation of the processing unit. Good.
  • the simplest example of the parameter updating circuit has a configuration including one multiplier / adder (MAC) 221 in which a multiplier and an adder are combined similarly to the inference circuit (the arithmetic circuit 22a in FIG. 7 (b) MAC221 etc.).
  • the parameter updating circuit can also be realized by, for example, the arithmetic circuits 22b, 22c, and 22d shown in FIGS. That is, the arithmetic circuits shown in FIGS. 7 to 10 are also examples of arithmetic circuits for parameter update arithmetic.
  • the high-precision parameter update processing unit 104b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d
  • the calculation by each variable, the adder, and the multiplier used in the arithmetic in the circuit is high. It is only necessary to correspond to the calculation accuracy (second calculation accuracy).
  • the accuracy of each variable, addition and multiplication used for the parameter update operation in the circuit realizes the high-efficiency inference processing unit 103a. It is only required that the accuracy of each variable used in the inference operation, addition and multiplication in the arithmetic circuit be higher than that of any of the addition and the multiplication.
  • the high-efficiency parameter update processing unit 104a is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, each variable used for the arithmetic in the circuit, the arithmetic by the adder and the multiplier are used. Should correspond to low operation accuracy (first operation accuracy).
  • FIG. 11 is a flowchart illustrating an example of the operation of the learning device 100 according to the present embodiment. The operation illustrated in FIG. 11 is performed based on, for example, control by the control unit 105.
  • control unit 105 reads the pre-learning model from the pre-learning model storage unit 101 and also reads the learning data from the learning data storage unit 102 (step S11).
  • the control unit 105 controls the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b as necessary to sequentially perform inference processing on each unit included in all of the first to nth layers.
  • Step S12 forward propagation.
  • the control unit 105 causes the high-efficiency inference processing unit 103a to perform inference processing of at least some of the units.
  • the control unit 105 may cause the high-efficiency inference processing unit 103a to perform inference processing for all units, or may cause the high-efficiency inference processing unit 103a to perform inference processing for some units.
  • the control unit 105 may cause the high-precision inference processing unit 103b to perform the inference processing of the remaining units.
  • the high-efficiency inference processing unit 103a and the high-accuracy inference processing unit 103b execute inference processing for a specified layer or unit in accordance with an instruction from the control unit 105.
  • control unit 105 controls the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b as necessary, and for a predetermined parameter among the parameters for calculating the output of the unit of each layer, A parameter update process is performed (step S13: parameter update process).
  • the control unit 105 causes the high-precision parameter update processing unit 104b to perform a parameter update process on at least some of the parameters.
  • the control unit 105 may cause the high-precision parameter update processing unit 104b to perform parameter update processing for all parameters, or may cause the high-precision parameter update processing unit 104b to perform parameter update processing for some parameters. Is also good.
  • the control unit 105 When causing the high-precision parameter update processing unit 104b to perform only the parameter update processing of some of the parameters in the parameter update processing, the control unit 105 causes the high-efficiency parameter update processing unit 104a to perform all the parameter update processing of the remaining parameters. The processing may be performed, or a part of the remaining parameters may be updated by the high-efficiency parameter update processing unit 104a. In the latter case, the parameter update processing itself is omitted for some parameters.
  • the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b execute the parameter update processing of the designated parameter according to the instruction from the control unit 105.
  • control unit 105 stores the learned model including the parameter updated in step S13 in the learned model storage unit 107 (step S14).
  • the operations of steps S11 to S14 may be repeated for the number of pieces of learning data.
  • the learned model as a learning result for the immediately preceding learning data is used as a pre-learning model of learning for the next learning data.
  • steps S12 to S13 can be repeatedly performed for the number of pieces of learning data.
  • step S11 to step S14 it is also possible to repeat the above-described operation of step S11 to step S14 or the operation of step S12 to step S14 a plurality of times using the same learning data (epoch). processing).
  • a range (low-precision inference range) in which inference processing is performed with low calculation accuracy is not only determined in advance, but also can be specified by the user, or can be specified for each learning data or epoch. It can be changed every time the processing is repeated.
  • the range in which the parameter update processing is performed with high calculation accuracy may be limited to only the fully connected layer.
  • a high-precision parameter update range, a range in which parameter update processing is performed with low calculation accuracy (low-precision parameter update range), and a range in which parameter update processing is not performed can be specified in advance as well as specified by the user. Or it can be changed at each processing (each learning data or each repetition of the epoch processing).
  • FIGS. 12 and 13 are flowcharts showing more specific operation examples of the learning device 100 of the present embodiment.
  • the operation examples shown in FIGS. 12 and 13 are examples in which the operation of each step is illustrated by focusing on the hardware configuring the learning device 100.
  • the hardware configuration was the configuration shown in FIG.
  • the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103a reads the learning data and the pre-learning model from the memory 13 in response to an instruction from the control device 14 as the control unit 105. (Step S111).
  • the low-precision arithmetic circuit 11 converts a part of forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the first to (k ⁇ 1) th layers) with low arithmetic accuracy. (Step S112). Then, the low-precision arithmetic circuit 11 stores the arithmetic result of step S112 (in this example, the output from each unit of the (k-1) th layer) in the memory 13 (step S113).
  • the pre-learning model is a neural network having a multilayer structure of n + 1 layers from the 0th layer to the nth layer, with the input layer being the 0th layer and the output layer being the nth layer.
  • the (k-1) th layer is an intermediate layer that is downstream of the input layer (0th layer) and upstream of the output layer (nth layer). That is, k is an integer satisfying 0 ⁇ k-1 ⁇ n.
  • the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads the operation result (output from each unit of the (k-1) th layer) stored in step S113 according to the instruction of the control device 14 ( Step S211).
  • the high-precision arithmetic circuit 12 performs the continuation of forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the k-th layer to the n-th layer) with high arithmetic accuracy (Ste S212).
  • the high-precision arithmetic circuit 12 serving as the high-precision parameter update processing unit 104b is configured to include, in accordance with an instruction from the control device 14, each of the layers included in some of the layers (the k-th to n-th layers in this example).
  • a parameter update operation for updating a parameter (such as a connection weight with another unit) in the unit is performed with high operation accuracy (step S212).
  • the high-precision arithmetic circuit 12 stores the arithmetic result of step S212 (in this example, updated parameters in each unit included in each of the k-th layer to the n-th layer) in the memory 13 (step S213).
  • the updated parameter stored as the calculation result in step S213 corresponds to the learned model described above.
  • the low-precision arithmetic circuit 11 performs inference processing on some layers as the high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 executes the high-precision parameter update processing unit 104b.
  • the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103 a stores the learning data and the pre-learning model in the memory 13 in accordance with an instruction from the control device 14 as the control unit 105. (Step S121).
  • the low-precision arithmetic circuit 11 performs forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the first to nth layers) with low arithmetic accuracy (step S122). . Then, the low-precision arithmetic circuit 11 stores the arithmetic result of step S122 (in this example, the output from the unit of the nth layer which is the output layer) in the memory 13 (step S123).
  • the pre-learning model is a neural network having a multilayer structure of (n + 1) th layers from the 0th layer to the nth layer, with the input layer being the 0th layer and the output layer being the nth layer.
  • the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads out the operation result (output from the unit of the n-th layer which is the output layer) stored in step S123 according to the instruction of the control device 14. (Step S221).
  • the high-precision arithmetic circuit 12 responds to an instruction from the control device 14 to set parameters (in other words, the k-th layer to the n-th layer) in each unit included in some layers (the k-th layer to the n-th layer).
  • a parameter update operation for updating the connection weight) is performed with high calculation accuracy (step S222).
  • the high-precision arithmetic circuit 12 stores the arithmetic result of step S222 (in this example, updated parameters in each unit included in each of the k-th layer to the n-th layer) in the memory 13 (step S223).
  • the updated parameter stored as the calculation result in step S223 corresponds to the learned model described above.
  • the low-precision arithmetic circuit 11 performs inference processing on all layers as a high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 performs a high-precision parameter update processing unit 104b.
  • the low-precision arithmetic circuit 11 may further perform the operation shown in FIG. 14 as the high-efficiency parameter update processing unit 104a.
  • the low-precision arithmetic circuit 11 reads out updated parameters in the units included in the k-th layer to the n-th layer stored in the memory 13 as the high-efficiency parameter update processing unit 104a (step S231). .
  • the low-precision arithmetic circuit 11 calculates parameters (such as connection weights with other units) in each unit included in the remaining layers (in this example, the first to (k-1) th layers). A parameter update operation for updating is performed with low operation accuracy (step S232). Then, the low-precision arithmetic circuit 11 saves the arithmetic result of step S232 (in this example, updated parameters in each unit included in each of the first to (k-1) th layers) in the memory 13 ( Step S233).
  • parameters such as connection weights with other units
  • the updated parameters stored as the calculation results in step S213 or S223 and the updated parameters stored as the calculation results in step S233 correspond to the learned model described above.
  • the operations shown in FIGS. 12 to 14 are examples of learning processing for one learning data. Therefore, when a plurality of pieces of learning data are held, it is possible to repeat the above-described operation and the respective operation steps included in the above-described operations for the number of pieces of learning data. Also, regardless of the number of learning data, it is also possible to repeat the above operation or each operation step included in the above operation a plurality of times using the same learning data (epoch process). Further, the k-th layer to the n-th layer, which are the high-precision parameter update ranges in the above operation, may be fully connected layers, or k may be specified by the user or changed every time processing is performed. .
  • the calculation processing of the learning algorithm is divided into inference processing and parameter update processing, at least a part of the inference processing is calculated with low calculation accuracy, and at least one of the parameter update processing is performed.
  • FIG. 15 is a block diagram illustrating a configuration example of a main part of the data processing device according to the second embodiment.
  • the data processing device 300 illustrated in FIG. 15 includes a low-precision arithmetic processing unit 31, a high-precision arithmetic processing unit 32, a communication path 33, and a data conversion unit 34.
  • the low-precision calculation processing unit 31 is a processing unit that performs a predetermined calculation with relatively low calculation accuracy.
  • the relatively low calculation accuracy may be any calculation accuracy that is lower than the calculation accuracy of the calculation performed by the high-precision calculation processing unit 32.
  • the high-precision operation processing unit 32 is a processing unit that performs a predetermined operation with relatively high operation accuracy.
  • the relatively high calculation accuracy may be any calculation accuracy that is higher than the calculation accuracy of the calculation performed by the low-precision calculation processing unit 31.
  • the low-precision arithmetic processing unit 31 may be, for example, the high-efficiency inference processing unit 103a or the high-efficiency parameter update processing unit 104a.
  • the high-precision arithmetic processing unit 32 may be, for example, the high-precision inference processing unit 103b or the high-precision parameter update processing unit 104b.
  • a measure of the width and fineness of the range of numerical data used for the operation actually performed in the data processing performed by the low-precision operation processing unit 31 and the high-precision operation processing unit 32 (more specifically, The scale of the range or fineness of the range of numeric data determined by the handling of the bit width and the decimal point in the arithmetic circuit that implements the processing unit) is referred to as “accuracy” or “operation accuracy”.
  • the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 are connected via a communication path 33 and a data conversion unit 34.
  • the data conversion unit 34 is provided between the high-precision arithmetic processing unit 32 and the communication path 33.
  • the communication path 33 may be realized by, for example, a bus.
  • the communication path 33 may be realized by a connection circuit (Inter-connect) provided inside the chip.
  • the communication path 33 may include not only a bus and a connection circuit but also a memory (such as an external memory and a buffer) connected to the bus and the connection circuit.
  • the data conversion unit 34 performs a predetermined conversion process on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32. At this time, the data conversion by the data conversion unit 34 is performed, for example, in data communication of data with a calculation accuracy in which the data amount (communication amount per data) becomes smaller in communication (data exchange) performed on the communication path 33. It is done to become.
  • the data conversion unit 34 converts each data passing through the communication path 33 into data having a smaller data amount among the calculation accuracy of the low-precision calculation processing unit 31 and the calculation accuracy of the high-precision calculation processing unit 32.
  • the transmission and reception data are converted.
  • the data conversion unit 34 may perform data conversion such that each data passing through the communication path 33 becomes data of the calculation accuracy of the low-precision calculation processing unit 31.
  • the data conversion includes a type conversion that matches a data type with a data type with lower operation accuracy of the processing unit that is a communication end point, and a data compression (particularly, a numerical sequence compression, a reduction in the number of digits, and the like). Numerical data compression) and the synthesis of two or more converted data.
  • the data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32 via the communication path 33, and converts the received data (low-operation-accuracy data) to high data.
  • the data is converted into data of the calculation accuracy (high calculation accuracy) handled by the precision calculation processing unit 32 and passed to the high-precision calculation processing unit 32.
  • the data conversion unit 34 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic processing data) into a low-precision arithmetic processing unit.
  • the data is converted into data of a calculation accuracy (low calculation accuracy) handled by 31 and transmitted to the communication channel 33.
  • the operation precision (here, the data type of the numerical value used in the operation) of the low-precision operation processing unit 31 is an integer 16 bits (INT16)
  • the operation accuracy of the high-precision operation processing unit 32 is a floating-point 32-bit (FP32).
  • the data conversion unit 34 may perform data conversion so that the data passing through the communication path 33 is 16-bit integer data.
  • the data conversion unit 34 Data conversion may be performed so that data passing through the communication path 33 becomes integer 16-bit data.
  • the communication of the data passing through the communication path 33 is a one-way communication (for example, only the transmission from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32, and the transmission from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31). Transmission only).
  • the data conversion unit 34 only needs to perform data conversion corresponding to communication actually performed.
  • the data conversion unit 34 may be realized by a dedicated data conversion circuit that performs data conversion designed according to the configuration and operation of the data processing device 300.
  • a dedicated data conversion circuit By mounting the data conversion unit 34 in a dedicated circuit, it is possible to omit processing for generalization such as reading of set values and states and branching according to them, and further efficiency can be achieved.
  • dedicating data conversion it is possible to easily implement conversion processing of multiple data at once, or perform data conversion of multiple data in parallel, collect the results, and transmit them collectively. Further, efficiency can be further improved.
  • the parallel processing of the data conversion and the compilation of the results are one of the data conversion examples in which the conversion of each data and the synthesis of the converted data are combined.
  • the data conversion unit 34 may realize such data conversion by, for example, a SIMD (Single instruction multiple data) operation.
  • FIG. 16 is a block diagram showing another configuration example of the data processing device of the second embodiment.
  • the data conversion unit 34 is provided only on the high-precision calculation processing unit 32 side, but the data conversion unit may be provided on the low-precision calculation processing unit 31 side.
  • the data processing device 300 shown in FIG. 16 is different from the configuration shown in FIG. 15 in that a data conversion unit 35 is further provided between the low-precision arithmetic processing unit 31 and the communication path 33. That is, in this example, the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 are connected via the data conversion unit 35, the communication path 33, and the data conversion unit 34.
  • the data conversion unit 35 performs a predetermined conversion process on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32.
  • the data conversion by the data conversion unit 34 and the data conversion unit 35 is performed in the communication (data exchange) performed on the communication path 33 in a data communication performed by the low-precision calculation processing unit 31 with the data amount at the calculation accuracy handled by the low-precision calculation processing unit 31.
  • the data communication is performed so that the data amount is smaller than the data amount of the operation accuracy.
  • the data conversion unit 34 and the data conversion unit 35 calculate the operation accuracy (hereinafter, ultra-low operation accuracy) in which each data passing through the communication path 33 is smaller than the data communication amount performed by the operation accuracy of the low-precision operation processing unit 31.
  • the transmission / reception data is converted so that the data becomes
  • the data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32 via the data conversion unit 35 and the communication path 33, and The data (the data of the ultra-low operation accuracy after conversion by the data conversion unit 35) is converted into data of the operation accuracy (high operation accuracy) handled by the high-accuracy operation processing unit 32, and is passed to the high-accuracy operation processing unit 32.
  • the data conversion unit 34 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic data) to ultra-low-precision arithmetic.
  • the data is converted into data and transmitted to the communication channel 33.
  • the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 via the data conversion unit 34 and the communication path 33, and receives the received data (
  • the low-precision arithmetic processing unit 31 converts the ultra-low-operation-precision data (converted by the data conversion unit 34) into data of low-precision arithmetic processing (low-operation accuracy) handled by the low-precision arithmetic processing unit 31.
  • the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic data) to ultra-low-precision arithmetic.
  • the data is converted into data and transmitted to the communication channel 33.
  • the operation precision (here, the data type of the numerical value used in the operation) of the low-precision operation processing unit 31 is an integer 16 bits (INT16)
  • the operation accuracy of the high-precision operation processing unit 32 is a floating-point 32-bit (FP32).
  • the data conversion unit 34 and the data conversion unit 35 convert the data passing through the communication path 33 into an integer 12 bits (INT12) or an integer 8 bits (INT8) having a data amount smaller than INT16. It may be compressed.
  • the data conversion unit 34 and the data conversion unit 35 perform numerical data compression (for example, reduction of lower bits) that reduces only the accuracy so that the data does not lose its meaning as numerical data.
  • the data conversion unit 34 and the data conversion unit 35 can also perform data compression using the feature of the activation function in deep learning. For example, if a step function, which is one of the activation functions, is used, data can be compressed to 1 bit. If ReLU (ramp function) is used, the number of code bits of data can be reduced.
  • the data conversion unit 34 and the data conversion unit 35 pack together a plurality of pieces of data having an odd number of bits, or combine a plurality of pieces of data thus collected into a plurality of pieces of data.
  • a process of decomposing the data may be performed. The efficiency of the pack / unpack processing can be improved by specializing the pack / unpack processing.
  • the data conversion unit 34 or the data conversion unit is provided for each combination at one or both end points of the communication path of the inter-core communication in the combination. What is necessary is just to provide the part 35.
  • FIG. 17 is a block diagram showing an outline of the data processing device of the present invention.
  • the data processing device 500 shown in FIG. 17 includes a low-precision arithmetic processing unit 501, a high-precision arithmetic processing unit 502, and a first data conversion unit 504.
  • the low-precision arithmetic processing unit 501 (for example, the low-precision arithmetic processing unit 31) performs a predetermined arithmetic operation with the first accuracy.
  • the high-precision arithmetic processing unit 502 (for example, the high-precision arithmetic processing unit 32) performs a predetermined arithmetic operation at a second accuracy higher than the first accuracy.
  • the first data conversion unit 504 (for example, the data conversion unit 34) is a high-precision arithmetic processing unit of the communication path 503 for transferring data between the high-precision arithmetic processing unit 502 and the low-precision arithmetic processing unit 501. It is provided at the end point on the 502 side.
  • the first data conversion unit 504 is configured so that the data transferred to and from the high-precision arithmetic processing unit 502 at the connection destination is data that can be handled by the high-precision arithmetic processing unit 502, and that the amount of data passing through the communication path 503 is The data passes between the communication path 503 and the high-precision arithmetic processing means 502 so that the data amount becomes equal to or less than the data amount when the data of the accuracy of 1 is used, and the accuracy of the data passing through the communication path 503 is equal to or less than the first accuracy. A predetermined conversion is performed on the data.
  • FIG. 18 is a configuration diagram showing a configuration example of the data processing circuit of the present invention.
  • the data processing circuit 600 illustrated in FIG. 18 includes a low-precision arithmetic circuit 601, a high-precision arithmetic circuit 602, and a first data conversion circuit 604.
  • the low-precision arithmetic circuit 601 (for example, the low-precision arithmetic processing unit 31 or the low-precision arithmetic circuit 11) performs a predetermined arithmetic operation with the first accuracy.
  • the high-precision operation circuit 602 (for example, the high-precision operation processing unit 32 or the high-precision operation circuit 12) performs a predetermined operation at a second accuracy higher than the first accuracy.
  • the first data conversion circuit 604 (for example, the data conversion unit 34) is provided on the high-precision arithmetic circuit 602 side of the communication path 603 for transferring data between the high-precision arithmetic circuit 602 and the low-precision arithmetic circuit 601.
  • a predetermined conversion is performed on data that is provided at an end point and passes between the communication path 603 and the high-precision arithmetic circuit 602.
  • data passed between the first data conversion circuit 604 and the high-precision arithmetic circuit 602 to be connected is data handled by the high-precision arithmetic circuit 602 and passes through the communication path 603.
  • the data amount is equal to or less than the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.
  • the data processing circuit 600 is further provided at an end point of the communication path 603 on the side of the low-precision arithmetic circuit 601, and is provided for data passing between the communication path 603 and the low-precision arithmetic circuit 601. And a second data conversion circuit 605 for performing predetermined conversion.
  • the data passed between the second data conversion circuit 605 and the low-precision arithmetic circuit 601 to be connected is the data handled by the low-precision arithmetic circuit 601 and the communication path.
  • Low-precision calculation processing means for performing a predetermined calculation with a first precision
  • high-precision calculation processing means for performing a predetermined calculation with a second precision higher than the first precision
  • high-precision calculation processing means And a first data conversion means provided at an end of the communication path on the high-precision processing means side for transferring data between the low-precision processing means and the low-precision processing means.
  • the data passed to and from the high-precision arithmetic processing means is data that can be handled by the high-precision arithmetic processing means, and the data amount passing through the communication path is equal to or less than the data amount when the first precision data is used.
  • performing a predetermined conversion on data passing between the communication path and the high-precision arithmetic processing means so that the accuracy of data passing through the communication path is equal to or less than the first accuracy. apparatus.
  • the first data conversion means receives data passed from the low-precision processing means to the high-precision processing means as first-precision data from the communication channel, and receives the received first-precision data.
  • the first data conversion means converts the data into data having a precision that can be handled by the high-precision processing means, and accepts the data passed from the high-precision processing means to the low-precision processing means as the data that can be handled by the high-precision processing means 2.
  • the data processing apparatus according to claim 1, wherein the received data is converted into data having an accuracy that can be handled by the low-precision arithmetic processing means.
  • the apparatus further includes a second data conversion means provided at an end point of the communication path on the low-precision processing means side, wherein the first data conversion means and the second data conversion means are connected to the processing processing means at the connection destination.
  • the data passed between the communication paths is data that can be handled by the processing means of the connection destination, and the amount of data passing through the communication path is smaller than the amount of data when the first precision data is used.
  • the data processing apparatus according to Supplementary note 1 or 2, wherein predetermined conversion is performed on data passing between the communication path and the processing means connected to the communication path so that the accuracy of the passing data is lower than the first accuracy. .
  • the first data conversion means and the second data conversion means transmit the data passed from the connection processing means to the other processing means from the communication path by a predetermined accuracy lower than the first accuracy.
  • the first data conversion unit and the second data conversion unit receive the data of the third accuracy and convert the received data of the third accuracy into data of an accuracy that can be handled by the arithmetic processing unit of the connection destination.
  • the data passed from the arithmetic processing means of the other party to the arithmetic processing means of the other party is received as it is as data that can be handled by the arithmetic processing means of the connection destination, and the received data is converted into data of accuracy that can be handled by the arithmetic processing means of the other party
  • a data processing device 1.
  • At least one of the first data conversion means and the second data conversion means when transmitting the converted data to the communication path, collectively transmits the plurality of converted data, and At least one of the data conversion unit and the second data conversion unit receives the plurality of converted data collected from the communication channel, decomposes the received plurality of converted data, and
  • the data processing apparatus according to Supplementary Note 3 or 4, wherein each data is converted into data having an accuracy that can be handled by the processing means connected to the connection destination.
  • the data processing device is a learning device that learns a predetermined discriminant model composed of two or more units connected in layers, and when learning data is input, each unit of the discriminant model is Learning means for performing inference processing for calculating the outputs of the units in a predetermined order, and parameter updating processing for updating at least a part of the parameters used for calculating the output of each unit based on the result of the inference processing.
  • the learning means as the low-precision arithmetic processing means, a high-efficiency inference means for performing a specified operation of the operations performed in the inference processing with a first arithmetic accuracy, and the high-precision arithmetic processing means A high-precision parameter update in which a specified operation among the operations performed in the parameter update process is performed with a second operation accuracy higher than the first operation accuracy.
  • the data processing apparatus according to any one of Appendices 6 Appendixes 1 and means.
  • a low-precision calculation circuit that performs a predetermined calculation with a first precision, a high-precision calculation circuit that performs a predetermined calculation with a second precision higher than the first precision, a high-precision calculation circuit, and a low-precision calculation circuit
  • a predetermined conversion is provided for data passing between the communication path and the high-precision arithmetic circuit, which is provided at an end point on the high-precision arithmetic circuit side of a communication path for transferring data to and from the arithmetic circuit.
  • a first data conversion circuit for performing the processing wherein data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit, and the amount of data passing through the communication path.
  • the data amount is equal to or less than the data amount when using data of the first accuracy
  • the accuracy of data passing through the communication path is equal to or less than the first accuracy.
  • a second data conversion circuit which is provided at an end point of the communication path on the low-precision calculation circuit side and performs predetermined conversion on data passing between the communication path and the low-precision calculation circuit, is further provided.
  • the data passed between the second data conversion circuit and the low-precision arithmetic circuit to be connected is data handled by the low-precision arithmetic circuit, and the amount of data passing through the communication path is the first precision data.
  • the data processing circuit according to attachment 8B wherein the data amount is smaller than the data amount when used and the accuracy of data passing through the communication path is lower than the first accuracy.
  • the first data conversion means provided at the end of the communication path for performing the high-precision processing on the side of the high-precision processing means can handle the data transferred to and from the high-precision processing means at the connection destination.
  • Communication channel so that the amount of data passing through the communication channel is equal to or less than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication channel is equal to or less than the first accuracy.
  • the first data conversion means is configured such that the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is 1 for data passing between the communication path and the high-precision arithmetic processing means so that the data amount is smaller than the data amount when using data of 1 precision and the precision of data passing through the communication path is lower than the first precision.
  • the second data converter provided at the end point of the communication path on the side of the low-precision arithmetic processing means performs low-precision arithmetic processing on the data transferred to and from the low-precision arithmetic processing means at the connection destination.
  • the present invention is not limited to deep learning, and is preferably used in a device that performs a process in which an operation that requires high precision and an operation that does not require high accuracy are mixed, while performing the process while suppressing power consumption. Applicable.
  • REFERENCE SIGNS LIST 10 arithmetic circuit 11 low-precision arithmetic circuit 12 high-precision arithmetic circuit 13 memory 14 control device 15 bus 51 unit 52 unit connection 53 inference process 54 parameter update process 100 learning device 101 pre-learning model storage unit 102 learning data storage unit 103a high Efficiency inference processing unit 103b High-precision inference processing unit 104a High-efficiency parameter update processing unit 104b High-precision parameter update processing unit 105 Control unit 106 Learning processing unit 107 Model storage unit after learning 1000 Computer 1001 CPU 1002 main storage device 1003 auxiliary storage device 1004 interface 1005 display device 1006 input device 1007 GPU 1008 Processor 21 Bus 22a, 22b, 22c, 22d Arithmetic circuit 221 MAC 222 memory layer 223 squared addition tree 224 ALU Reference Signs List 300 data processing device 31 low-precision arithmetic processing unit 32 high-precision arithmetic processing unit 33 communication channel 34, 35 data conversion unit 500 data processing device 501 low-precision arith

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

A data processing device 500 comprises: a low-precision computation processing means 501 for performing prescribed computation with a first precision; a high-precision computation processing means 502 for performing prescribed computation with a second precision that is higher than the first precision; and a first data conversion means 504 provided at an end point, on the high-precision-computation-processing-means 502 side, of a communication path 503 for delivering data between the two computation processing means 501, 502. The first data conversion means 504 performs prescribed conversion on the data passing between the communication path 503 and the high-precision computation processing means 502 so that the data delivered to and from the high-precision computation processing means 502 can be handled by the computation processing means 502, the amount of data passing through the communication path 503 is smaller than or equal to a data amount in a case where data of the first precision is used, and the precision of data passing though the communication path 503 is less than or equal to the first precision.

Description

データ処理装置、データ処理回路およびデータ処理方法Data processing device, data processing circuit, and data processing method
 本発明は、2種類の演算精度による演算を含むデータ処理を行うデータ処理装置、データ処理回路およびデータ処理方法に関する。 The present invention relates to a data processing device, a data processing circuit, and a data processing method for performing data processing including calculations with two types of calculation accuracy.
 機械学習の普及が進み、時々刻々と変化する状況に対応するための更なる工夫が求められている。 With the spread of machine learning, further innovations are needed to cope with ever-changing situations.
 そのためには、実際に使用される環境で取得される多様な生データを学習用データとして学習に取り入れる必要がある。学習用データを用いた学習(機械学習)では、例えば、学習用データで示される入力と出力の関係等に基づいて、所定の学習器で使用される演算式や判別式のパラメタが調整される。学習器は、例えば、データが入力されると、1つまたは複数のラベルについての判別を行う判別モデル等である。 To do so, it is necessary to incorporate various raw data acquired in the environment in which it is actually used into learning as learning data. In learning using learning data (machine learning), for example, parameters of arithmetic expressions and discriminants used in a predetermined learning device are adjusted based on the relationship between input and output indicated by the learning data. . The learning device is, for example, a discrimination model that performs discrimination on one or a plurality of labels when data is input.
 機械学習における演算資源と演算精度の関係について、例えば、非特許文献1には、ニューラルネットワークの深層学習を効率的に、特に低い消費電力で実行するための学習用演算回路および学習方法の例が記載されている。 Regarding the relationship between calculation resources and calculation accuracy in machine learning, for example, Non-Patent Document 1 discloses an example of a learning calculation circuit and a learning method for efficiently executing deep learning of a neural network, particularly with low power consumption. Has been described.
 また、非特許文献2には、CNN(Convolutional Neural Network)における深層学習において、複数ある畳込み層を、重みが固定される層と重みが更新される層(拡張機能層)に分けて学習範囲を制限することで、学習時間の短縮を図る学習方法の例が記載されている。 In Non-Patent Document 2, in deep learning in CNN (Convolutional Neural Network), a learning range is divided into a plurality of convolutional layers into a layer in which weights are fixed and a layer in which weights are updated (extended function layer). An example of a learning method for shortening the learning time by limiting is described.
 また、機械学習における学習演算用の回路構成の例として、非特許文献3には、FPGA(Field-Programmable Gate Array)をベースとしたアクセラレータ設計の最適化例が記載されている。 Non-Patent Document 3 describes an optimization example of accelerator design based on FPGA (Field-Programmable Gate Array) as an example of a circuit configuration for learning operation in machine learning.
 学習用データを用いた機械学習の多くは、汎用用途の学習アルゴリズムに対応すべく、大規模な高精度演算回路を構築可能なクラウド環境で行われていた。 機械 Most of machine learning using learning data has been performed in a cloud environment where large-scale high-precision arithmetic circuits can be constructed to support general-purpose learning algorithms.
 しかし、現場によっては、ネットワーク帯域の制限やプライバシの保護等、種々のデータ移動の制約があるため、クラウド環境ではなく、現場にあるデバイス内(以下、エッジ・デバイス層という)で学習できる仕組みが望まれる。そのためには、より少ないコンピュータ資源ひいては低消費電力で、十分な認識率を得られる学習方法が望まれる。 However, depending on the site, there are various restrictions on data movement, such as restrictions on network bandwidth and protection of privacy. Therefore, there is a mechanism that allows learning not in a cloud environment but in a device at the site (hereinafter referred to as an edge device layer). desired. For that purpose, a learning method that can obtain a sufficient recognition rate with less computer resources and thus lower power consumption is desired.
 非特許文献1に記載の学習方法によれば、32bit浮動小数点の演算回路を用いて学習を行うNVIDIA社のTK1(Jetson Kit)と比較して、16bit固定小数点の演算回路を用いることで、より低い消費電力で学習を実現できるとされている。しかし、当該方法は、すべての学習演算(パラメータの調整を行うための全ての演算)を行う演算回路におけるビット幅を削減することにより、演算精度の低下と引き換えに消費電力を低減しようというものにすぎず、演算回路そのものの演算精度が低下することによる弊害については何ら考慮されていない。例えば、学習演算を実施するのに十分な演算精度が確保されないおそれ等については何ら考慮されていない。 According to the learning method described in Non-Patent Document 1, a 16-bit fixed-point arithmetic circuit is used as compared with TK1 (Jetson @ Kit) of NVIDIA which performs learning using a 32-bit floating-point arithmetic circuit. It is said that learning can be realized with low power consumption. However, this method is intended to reduce the power consumption by reducing the bit width in the arithmetic circuit that performs all the learning operations (all the operations for adjusting the parameters) in exchange for a decrease in the operation accuracy. However, no consideration is given to the adverse effects caused by a reduction in the calculation accuracy of the calculation circuit itself. For example, no consideration is given to the possibility that sufficient calculation accuracy for performing the learning calculation is not ensured.
 例えば、深層学習を行う演算回路では、複数のユニットが層状に結合された構成を利用した多層演算が行われるが、この時の多層演算は、層ごとにユニットの出力を計算する部分(いわゆる推論処理。例えば、順伝搬処理)と、該計算に用いるパラメタ(例えば、重み等)を更新するための計算をする部分(いわゆるパラメタ更新処理。例えば、逆伝搬処理)とに大別される。このうちの特にパラメタ更新処理が、機械学習における実際の学習演算部分に相当するといえる。したがって、パラメタ更新処理の演算精度は、運用時の認識率に大きく影響を与える演算であり、高精度であればあるほど好ましい。一方で、推論処理の演算精度は、それほど高くなくてもよい場合が多い。 For example, in an arithmetic circuit that performs deep learning, a multi-layer operation using a configuration in which a plurality of units are connected in a layered manner is performed. In this case, the multi-layer operation is performed by calculating a unit output for each layer (so-called inference). Processing, for example, forward propagation processing) and a part for performing calculation for updating parameters (for example, weights) used in the calculation (so-called parameter updating processing, for example, back propagation processing). In particular, it can be said that the parameter update processing corresponds to an actual learning operation part in machine learning. Therefore, the calculation accuracy of the parameter update process is a calculation that greatly affects the recognition rate during operation, and the higher the accuracy, the better. On the other hand, the calculation accuracy of the inference processing does not need to be very high in many cases.
 したがって、学習処理に含まれる演算のうち、例えば高い精度を必要とする演算のみを高精度演算を行い、高い精度を必要としない演算は低い精度で演算すれば、消費電力を低減しつつ十分な精度での学習が可能になる。そこで、高い精度を必要とする演算と高い精度を必要としない演算が混在している処理を行う装置が2種類の演算精度の演算回路を有し、当該処理において行われる各演算の実施先とする回路を、該演算が必要とする精度に応じて切り替えながら実行することを考える。その場合、該装置では、演算精度が異なるコア(演算回路)間のデータ交換が必須要件となる。このような、演算精度が異なるコア間のデータ交換を含む学習処理のさらなる効率化を図るには、コア間のデータ交換の高速化が重要となる。 Therefore, among operations included in the learning process, for example, only operations requiring high accuracy are performed with high accuracy, and operations not requiring high accuracy are performed with low accuracy. Learning with accuracy becomes possible. Therefore, an apparatus that performs a process in which an operation that requires high precision and an operation that does not require high accuracy are mixed has an operation circuit of two types of operation accuracy, and the execution destination of each operation performed in the process is It is assumed that the circuit to be executed is executed while being switched according to the accuracy required for the operation. In this case, data exchange between cores (arithmetic circuits) having different arithmetic accuracies is an essential requirement in the device. In order to further improve the efficiency of the learning process including the data exchange between cores having different operation precisions, it is important to speed up the data exchange between the cores.
 なお、非特許文献2に記載の学習方法は、学習範囲を制限することで学習時間の短縮をしようとするものにすぎず、演算精度が異なるコア間のデータ交換を含む学習処理の効率化、特に異なる精度を有するコア間のデータ交換の効率化については何ら考慮されていない。また、非特許文献3に記載の方法は、すべての学習演算を行う回路の回路構成の最適化により回路規模や計算時間の縮小を行おうというものにすぎず、やはり演算精度が異なるコア間のデータ交換を含む学習処理の効率化、特に異なる精度を有するコア間のデータ交換の効率化については何ら考慮されていない。 Note that the learning method described in Non-Patent Document 2 merely aims to reduce the learning time by limiting the learning range, and to improve the efficiency of learning processing including data exchange between cores having different calculation accuracy. In particular, no consideration is given to efficient data exchange between cores having different accuracy. Also, the method described in Non-Patent Document 3 is merely to reduce the circuit scale and the calculation time by optimizing the circuit configuration of the circuit that performs all the learning operations, and also between the cores having different calculation accuracy. No consideration is given to improving the efficiency of learning processing including data exchange, particularly the efficiency of data exchange between cores having different accuracies.
 本発明は、上述した課題に鑑みて、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理のさらなる効率化が可能なデータ処理装置、データ処理回路およびデータ処理方法を提供することを目的とする。 The present invention has been made in view of the above-described problems, and provides a data processing device, a data processing circuit, and a data processing method that can further increase the efficiency of a process in which an operation that requires high accuracy and an operation that does not require high accuracy are mixed. The purpose is to do.
 本発明によるデータ処理装置は、第1の精度で所定の演算を行う低精度演算処理手段と、第1の精度よりも高い第2の精度で所定の演算を行う高精度演算処理手段と、高精度演算処理手段と低精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第1のデータ変換手段とを備え、第1のデータ変換手段は、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第1の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とする。 A data processing device according to the present invention includes a low-precision arithmetic processing unit that performs a predetermined operation with a first accuracy, a high-precision arithmetic processing unit that performs a predetermined operation with a second accuracy higher than the first accuracy, A first data converter provided at an end of the communication path on the side of the high-precision arithmetic processing means for transferring data between the precision arithmetic processing means and the low-precision arithmetic processing means; The means is provided when the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path uses first-precision data. A predetermined conversion is performed on the data passing between the communication path and the high-precision arithmetic processing means so that the data amount of the communication path becomes equal to or less than the data amount and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. And
 本発明によるデータ処理回路は、第1の精度で所定の演算を行う低精度演算回路と、第1の精度よりも高い第2の精度で所定の演算を行う高精度演算回路と、高精度演算回路と低精度演算回路との間でデータの受け渡しを行うための通信路の高精度演算回路側の端点に設けられ、通信路と高精度演算回路との間を通るデータに対して、予め定められた変換を行う第1のデータ変換回路とを備え、第1のデータ変換回路と接続先の高精度演算回路との間で受け渡されるデータが高精度演算回路が扱うデータであり、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下であり、かつ当該通信路を通るデータの精度が第1の精度以下であることを特徴とする。 A data processing circuit according to the present invention includes a low-precision arithmetic circuit that performs a predetermined operation at a first accuracy, a high-precision arithmetic circuit that performs a predetermined operation at a second accuracy higher than the first accuracy, and a high-precision operation. Provided at the end of the communication path for transmitting data between the circuit and the low-precision arithmetic circuit on the high-precision arithmetic circuit side, and predetermined for data passing between the communication path and the high-precision arithmetic circuit. And a first data conversion circuit for performing the converted data, wherein data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit. Is less than or equal to the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.
 本発明によるデータ処理方法は、第1の精度で所定の演算を行う低精度演算処理手段と、第1の精度よりも高い第2の精度で所定の演算を行う高精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第1のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第1の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とする。 The data processing method according to the present invention is characterized in that a low-precision calculation processing means for performing a predetermined calculation with a first precision and a high-precision calculation processing means for performing a predetermined calculation with a second precision higher than the first precision. The first data conversion means provided at the end of the communication path on the side of the high-precision arithmetic processing means for transferring data with the high-precision arithmetic processing means is capable of performing high-precision arithmetic In addition to the data that can be handled by the processing means, the amount of data passing through the communication path is equal to or less than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication path is equal to or less than the first accuracy. In addition, a predetermined conversion is performed on data passing between the communication path and the high-precision arithmetic processing means.
 本発明によるデータ処理方法は、第1のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第1の精度より低くなるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行い、通信路の低精度演算処理手段側の端点に設けられる第2のデータ変換手段が、接続先の低精度演算処理手段との間で受け渡されるデータが低精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第1の精度より低くなるように、通信路と低精度演算処理手段との間を通るデータに対して所定の変換を行う構成であってもよい。 In the data processing method according to the present invention, the first data conversion means may be configured so that the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the data passing through the communication path. The distance between the communication path and the high-precision arithmetic processing means is set so that the amount is smaller than the data amount when the first precision data is used, and the precision of the data passing through the communication path is lower than the first precision. The second data converter provided at the end of the communication path on the side of the low-precision processing means performs predetermined conversion on the data passing therethrough. The data that can be handled by the low-precision arithmetic processing means, the amount of data passing through the communication path is smaller than the amount of data when the data of the first precision is used, and the precision of data passing through the communication path is lower than the first precision. Lower As described above, it may be configured to perform a predetermined conversion on the data passing between the channel and the low-precision arithmetic processing means.
 本発明によれば、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理をさらに効率化できる。 According to the present invention, it is possible to further improve the efficiency of processing in which arithmetic operations requiring high precision and arithmetic operations not requiring high accuracy are mixed.
本発明のデータ処理方法の例としての学習方法の概略を示す説明図である。FIG. 3 is an explanatory diagram illustrating an outline of a learning method as an example of the data processing method of the present invention. あるユニットの入出力および他ユニットとの結合の例を示す説明図である。It is an explanatory view showing an example of input and output of a certain unit, and combination with another unit. 第1の実施形態の学習装置の構成例を示すブロック図である。It is a block diagram showing an example of composition of a learning device of a 1st embodiment. 学習処理部106のハードウエア構成の例を示す構成図である。FIG. 2 is a configuration diagram illustrating an example of a hardware configuration of a learning processing unit 106. 低精度演算回路11における演算精度と高精度演算回路12における演算精度の組み合わせの例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a combination of the calculation accuracy in the low precision calculation circuit 11 and the calculation precision in the high precision calculation circuit 12. 学習装置100にかかるコンピュータの構成例を示す概略ブロック図である。FIG. 2 is a schematic block diagram illustrating a configuration example of a computer according to the learning device 100. 演算回路の例を示す概略構成図である。FIG. 3 is a schematic configuration diagram illustrating an example of an arithmetic circuit. 演算回路の他の例を示す概略構成図である。FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit. 演算回路の他の例を示す概略構成図である。FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit. 演算回路の他の例を示す概略構成図である。FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit. 第1の実施形態の学習装置100の動作の例を示すフローチャートである。4 is a flowchart illustrating an example of an operation of the learning device 100 according to the first embodiment. 学習装置100のより具体的な動作例を示すフローチャートである。6 is a flowchart illustrating a more specific operation example of the learning device 100. 学習装置100のより具体的な動作の他の例を示すフローチャートである。9 is a flowchart illustrating another example of a more specific operation of the learning device 100. 学習装置100のより具体的な動作の他の例を示すフローチャートである。9 is a flowchart illustrating another example of a more specific operation of the learning device 100. 第2の実施形態のデータ処理装置の構成例を示す説明図である。FIG. 9 is an explanatory diagram illustrating a configuration example of a data processing device according to a second embodiment. 第2の実施形態のデータ処理装置の構成例を示す説明図である。FIG. 9 is an explanatory diagram illustrating a configuration example of a data processing device according to a second embodiment. 本発明のデータ処理装置の概要を示すブロック図である。It is a block diagram showing the outline of the data processor of the present invention. 本発明のデータ処理回路の構成を示す構成図である。FIG. 2 is a configuration diagram illustrating a configuration of a data processing circuit of the present invention. 本発明のデータ処理回路の他の構成を示す構成図である。FIG. 9 is a configuration diagram illustrating another configuration of the data processing circuit of the present invention.
 以下、本発明の実施形態について図面を参照して説明する。以下では、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理の例に深層学習における学習処理を用いて本発明を説明するが、本発明が適用される処理、装置およびデータ処理方法は学習処理、学習装置および学習方法に限定されない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following, the present invention will be described using a learning process in deep learning as an example of a process in which an operation requiring high accuracy and an operation not requiring high accuracy are mixed. The data processing method is not limited to the learning process, the learning device, and the learning method.
まず、本発明のデータ処理の例としての学習処理の概略を説明する。図1(a)は、入力層と出力層との間に1つ以上の中間層を含むニューラルネットワークにおける一般的な学習方法およびそのための回路構成の例を示す説明図であり、図1(b)は、本発明のデータ処理方法の例としての学習方法およびそのための回路構成の例を示す説明図である。 First, an outline of a learning process as an example of data processing of the present invention will be described. FIG. 1A is an explanatory diagram showing an example of a general learning method in a neural network including one or more intermediate layers between an input layer and an output layer, and a circuit configuration therefor. 3) is an explanatory diagram showing an example of a learning method as an example of the data processing method of the present invention and an example of a circuit configuration therefor.
 図1(a)に示す例では、汎用用途の学習アルゴリズムに対応すべく、大規模学習回路90を用いて、所定の判別モデルであるニューラルネットワーク全体を学習する。 In the example shown in FIG. 1A, a large-scale learning circuit 90 is used to learn the entire neural network, which is a predetermined discriminant model, in order to support a learning algorithm for general use.
 なお、図1では、回路に付した吹き出しに、ニューラルネットワークの学習過程における処理の方向およびその範囲を模式的に示している。吹き出し内において、符号51(図中の丸)はニューラルネットワークにおけるニューロンに相当するユニットを表す。また、符号52(図中のユニット間を結ぶ線)は、ユニット間結合を表す。また、符号53(図中の右向きの太線矢印)は、推論処理およびその範囲を表す。また、符号54(図中の左向きの太線矢印)は、パラメタ更新処理およびその範囲を表す。なお、図1では、各ユニットへの入力が前段の層のユニットの出力となるフィードフォワード型のニューラルネットワークの例を示しているが、各ユニットへの入力はこれに限らない。例えば、時系列情報を保持している場合には、リカレント型のニューラルネットワークのように、各ユニットへの入力に、前の時刻における前段の層のユニットの出力を含めることも可能である。なお、そのような場合も、推論処理の方向は、入力層から出力層へと向かう方向(順方向)であるとみなされる。このように入力層から所定の順番で行われる推論処理は「順伝搬」とも呼ばれる。一方、パラメタ更新処理の方向は、特に限定されない。図中のパラメタ更新処理のように、出力層から入力層へと向かう方向(逆方向)であってもよい。なお、図中のパラメタ更新処理の方向は、誤差逆伝搬法の例であるが、パラメタ更新処理は誤差逆伝搬法に限定されない。例えば、パラメタ更新処理がSTDP(Spike Timing Dependent Plasticity)等であってもよい。 In FIG. 1, balloons attached to the circuits schematically show directions and ranges of processing in the learning process of the neural network. In the balloon, reference numeral 51 (circle in the figure) represents a unit corresponding to a neuron in the neural network. Reference numeral 52 (a line connecting the units in the drawing) represents an inter-unit connection. Reference numeral 53 (the right-handed bold arrow in the figure) indicates the inference processing and its range. Reference numeral 54 (a thick arrow pointing left in the figure) indicates a parameter update process and its range. Although FIG. 1 shows an example of a feedforward type neural network in which an input to each unit is an output of a unit in a preceding layer, an input to each unit is not limited to this. For example, when time series information is held, the input to each unit can include the output of the unit of the preceding layer at the previous time, as in a recurrent neural network. In such a case as well, the direction of the inference processing is considered to be the direction (forward direction) from the input layer to the output layer. Such inference processing performed in a predetermined order from the input layer is also called “forward propagation”. On the other hand, the direction of the parameter update processing is not particularly limited. The direction may be a direction from the output layer to the input layer (reverse direction) as in the parameter update processing in the figure. Although the direction of the parameter update processing in the figure is an example of the error back propagation method, the parameter update processing is not limited to the error back propagation method. For example, the parameter update processing may be STDP (Spike \ Timing \ Dependent \ Plasticity).
 ニューラルネットワークに限らず、深層学習におけるモデルの学習方法の例としては、次のような学習方法が挙げられる。まず、入力層に学習用データを入力した後、出力層までの各層で順方向に各ユニットの出力を計算する推論処理を行う(順伝搬:図中の矢印53参照)。次いで、出力層からの出力(最終出力)と学習用データで示される入力と出力の関係等とから算出される誤差に基づいて、その誤差を最小化するように、出力層から第1層までの各層を逆方向に辿って、層内の各ユニットの出力を計算するためのパラメタを更新するパラメタ更新処理を行う(逆伝搬:図中の矢印54参照)。 限 ら Not limited to neural networks, examples of the method of learning a model in deep learning include the following learning methods. First, after inputting learning data to the input layer, an inference process of calculating the output of each unit in the forward direction in each layer up to the output layer is performed (forward propagation: see arrow 53 in the figure). Next, based on an error calculated from the output from the output layer (final output) and the relationship between the input and the output indicated by the learning data, the output layer to the first layer are designed to minimize the error. Tracing each layer in the reverse direction to perform a parameter update process of updating a parameter for calculating an output of each unit in the layer (back propagation: see arrow 54 in the figure).
 図1(a)に示すように、モデル全体を学習対象とした場合、パラメタ更新処理で、入力層より後段の全ての層(第1層~第n層)において層内の各ユニットの出力を計算するためのパラメタ(例えば、層内の各ユニットと他の層のユニットを結合するユニット結合の重み等)を更新する。このようなパラメタ更新処理を、例えば学習用データを替えながら複数回繰り返すことにより、高い認識率を有する学習済みモデルを生成できる。図1(a)では、そのような学習を行う演算回路の実現例として、上記の推論処理とパラメタ更新処理とを高い演算精度で行う大規模学習回路90が示されている。しかし、推論処理やパラメタ更新処理の演算精度が高いほど、またその処理の計算範囲が広いほど、誤差関数の展開項数が増えて回路が大規模化するため、消費電力が非常に増大する。 As shown in FIG. 1A, when the entire model is set as a learning target, the output of each unit in each layer in all layers (first to n-th layers) subsequent to the input layer is subjected to parameter update processing. The parameters for calculation (for example, the weight of a unit connection that connects each unit in a layer to a unit in another layer) are updated. By repeating such a parameter updating process a plurality of times while changing the learning data, for example, a learned model having a high recognition rate can be generated. FIG. 1A shows a large-scale learning circuit 90 that performs the above-described inference processing and parameter updating processing with high calculation accuracy as an example of realizing an arithmetic circuit that performs such learning. However, the higher the calculation accuracy of the inference process and the parameter update process, and the wider the calculation range of the process, the larger the number of expansion terms of the error function and the size of the circuit, resulting in a large increase in power consumption.
一方、本発明では、図1(b)に示すように、モデルの一部のみを学習対象とする。なお、ここでいう学習は、上記と同様、より実際の学習処理である、パラメタ更新処理を指す。モデルの一部のみを学習対象とした場合、順伝搬までは上記と同様に行う。その上で、出力層からの出力(最終出力)と学習用データで示される入力と出力の関係等とから算出される誤差に基づいて、指定されたユニット(例えば、出力層である第n層から第k層までの各層内のユニット)についてのみ、当該ユニットの出力を計算するためのパラメタ(例えば、他ユニットとの結合にかかる重み等)を更新するパラメタ更新処理を行う。 On the other hand, in the present invention, as shown in FIG. 1B, only a part of the model is set as a learning target. Note that the learning here refers to a parameter updating process, which is a more actual learning process, as described above. When only a part of the model is to be learned, the process is performed in the same manner as described above up to forward propagation. Then, based on an error calculated from the output from the output layer (final output) and the relationship between the input and output indicated by the learning data, a designated unit (for example, the nth layer which is the output layer) For only the units in each layer from the first layer to the k-th layer), a parameter updating process for updating the parameter for calculating the output of the unit (for example, the weight for coupling with another unit) is performed.
 図1(b)では、そのような学習を行う演算回路10の実現例として、高い演算精度で指定された一部のユニットのパラメタ更新処理を行う高精度演算回路12と、高精度演算回路12よりも低い演算精度で少なくとも指定されたユニットの推論処理を行う低精度演算回路11とを組み合わせた例が示されている。このような2つの異なる演算精度を有する演算回路を備えた上で、高精度演算回路12に対しては、例えば、高精度演算が必要な一部のユニットについてパラメタ更新処理を行わせ、低精度演算回路11に対しては、高精度演算が必要でない他の処理を行わせる。このようにして1つの学習用データに対する学習演算の中で、推論処理の少なくとも一部を低い演算精度で実施し、かつパラメタ更新処理の少なくとも一部を高い演算精度で実施するとともに、高い演算精度で実施するパラメタ更新処理の範囲を最適化することで、コンピュータ資源を効率化(低消費電力等)しつつ、十分な演算精度を確保する。 In FIG. 1B, as an implementation example of the arithmetic circuit 10 that performs such learning, a high-precision arithmetic circuit 12 that performs parameter update processing of some units specified with high arithmetic accuracy, and a high-precision arithmetic circuit 12 An example is shown in which a low-precision operation circuit 11 that performs inference processing of at least a specified unit with lower operation accuracy is combined. In addition to the provision of such two operation circuits having different operation precisions, the high-precision operation circuit 12 is caused to perform parameter update processing for some units requiring high-precision operation, The arithmetic circuit 11 performs other processing that does not require high-precision arithmetic. As described above, in the learning operation on one learning data, at least a part of the inference processing is performed with a low calculation accuracy, and at least a part of the parameter update processing is performed with a high calculation accuracy. By optimizing the range of the parameter update processing performed in step (1), computer resources are made more efficient (low power consumption, etc.) and sufficient calculation accuracy is secured.
 なお、図1(b)では出力側の一部の層をパラメタの更新を行う範囲(実際の学習範囲)とする例を示したが、パラメタの更新範囲は出力側の層に限られず、例えば、第1層~第n層のうちの奇数層や偶数層などといった個別的な指定も可能である。また、図1(b)では、パラメタ更新処理自体の範囲を制限する例を示したが、パラメタ更新処理自体の範囲は制限せず、高演算精度で実施するパラメタ更新処理の範囲を制限してもよい。すなわち、全てのユニットのうちの一部のユニットについてのみ高演算精度でパラメタ更新処理を行い、それ以外のユニットについては低い演算精度でパラメタ更新処理を行うことも可能である。なお、パラメタ更新処理の対象として、高精度演算により実施されるユニットと、低精度演算により実施されるユニットと、実施されないユニット(その際、パラメタは固定される)の3種類に分けることも可能である。 Although FIG. 1B shows an example in which some layers on the output side are set as a range for updating parameters (actual learning range), the range for updating parameters is not limited to the layers on the output side. It is also possible to individually specify an odd layer, an even layer, or the like among the first to n-th layers. FIG. 1B shows an example in which the range of the parameter update process itself is limited. However, the range of the parameter update process itself is not limited, and the range of the parameter update process performed with high calculation accuracy is limited. Is also good. That is, it is possible to perform the parameter update processing with high calculation accuracy only for some of the units, and perform the parameter update processing with low calculation accuracy for the other units. It should be noted that the parameter update processing can be divided into three types: a unit performed by a high-precision calculation, a unit performed by a low-precision calculation, and a unit not performed (the parameters are fixed at that time). It is.
 また、高精度演算と低精度演算の対象とする処理の分け方の他の例としては、全てのユニットの推論処理を低精度演算で行い、かつ全てのユニットのパラメタ更新処理を高精度演算で行うことも可能である。また、例えば、全てのユニットの推論処理を低精度演算で行い、かつ一部のユニットのパラメタ更新処理を高精度演算で行うことも可能である。その場合、高精度演算の対象外とされた残りの一部のユニットについては、低精度演算でパラメタ更新処理を行ってもよいし、パラメタ更新処理の対象外としてもよい。また、例えば、一部のユニットについては推論処理およびパラメタ更新処理を低精度演算で行い、残りの一部のユニットについては推論処理およびパラメタ更新処理を高精度演算で行うことも可能である。 Further, as another example of the method of dividing the processing to be subjected to the high-precision operation and the low-precision operation, the inference processing of all the units is performed by the low-precision operation, and the parameter update processing of all the units is performed by the high-precision operation. It is also possible to do. Further, for example, it is also possible to perform the inference processing of all the units by low-precision calculation, and to perform the parameter update processing of some units by high-precision calculation. In that case, the parameter update processing may be performed by low-precision calculation or may be excluded from the parameter update processing for some of the remaining units excluded from the high-precision calculation. Further, for example, it is also possible to perform inference processing and parameter update processing by low-precision calculation for some units, and to perform inference processing and parameter update processing by high-precision calculation for the remaining units.
 換言すると、本発明のデータ処理方法の例としての学習方法は、学習装置が、相対的に低い演算精度を有する低精度演算回路と、相対的に高い演算精度を有する高精度演算回路とを備え、低精度演算回路に少なくとも一部のユニットの推論処理を行わせ、かつ高精度演算回路に少なくとも一部のユニットのパラメタ更新処理を行わせるものであればよい。その上で、残りの一部のユニットの推論処理については低精度演算回路で行ってもよいし、高精度演算回路で行ってもよい。また、上記の残りの一部のユニットのパラメタ更新処理については低精度演算回路で行ってもよいし、処理そのものを省略してもよい。どのユニットについて高精度の推論処理の対象とするか低精度の推論処理の対象とするかや、どのユニットについて高精度のパラメタ更新処理の対象とするか低精度のパラメタ更新処理の対象とするかもしくは処理対象外とするか等については、特に限定されない。 In other words, in the learning method as an example of the data processing method of the present invention, the learning device includes a low-precision operation circuit having a relatively low operation accuracy and a high-precision operation circuit having a relatively high operation accuracy. Any configuration may be used as long as it causes the low-precision arithmetic circuit to perform inference processing for at least some of the units and the high-precision arithmetic circuit to perform parameter update processing for at least some of the units. In addition, the inference processing of some of the remaining units may be performed by a low-precision arithmetic circuit or a high-precision arithmetic circuit. Further, the parameter update processing of the remaining part of the units may be performed by a low-precision arithmetic circuit, or the processing itself may be omitted. Which units are subject to high-precision inference processing or low-precision inference processing, and which units are subject to high-precision parameter update processing or low-precision parameter update processing Alternatively, there is no particular limitation on whether or not the processing is to be performed.
 なお、上記は、異なる演算精度を有する2つの演算回路を利用する場合の例であるが、例えば、異なる演算精度を有する2以上の演算回路を利用する場合も基本的に同様である。すなわち、ある一部のユニットの推論処理を行う演算回路の演算精度に対して、より高い演算精度を有する演算回路にてある一部のユニットのパラメタ更新処理が行われる構成であれば、他の一部のユニットの推論処理およびパラメタ更新処理が具体的にどの演算回路で行われるかまたは処理自体が行われないかは特に限定されない。 The above is an example in which two operation circuits having different operation precisions are used. However, for example, the case where two or more operation circuits having different operation precisions are used is basically the same. In other words, if the configuration is such that the parameter update processing of a certain unit is performed by an arithmetic circuit having a higher calculation accuracy, the calculation accuracy of the arithmetic circuit that performs the inference processing of a certain unit is different from that of the other calculation unit. It is not particularly limited in which arithmetic circuit the inference processing and the parameter update processing of some units are performed or the processing itself is not performed.
 図2は、1つのユニットに着目したときの当該ユニットの入出力および他ユニットとの結合の例を示す説明図である。図2(a)に1つのユニットの入出力の例、(b)に2層に並べられたユニット間の結合の例を示す。図2(a)に示すように、1つのユニットに対して4つの入力(x~x)と1つの出力(z)があった場合に、当該ユニットの動作は例えば、式(1A)のように表される。ここで、f()は活性化関数を表している。 FIG. 2 is an explanatory diagram showing an example of input / output of the unit and connection with another unit when focusing on one unit. FIG. 2A shows an example of input and output of one unit, and FIG. 2B shows an example of coupling between units arranged in two layers. As shown in FIG. 2A, when there are four inputs (x 1 to x 4 ) and one output (z) for one unit, the operation of the unit is, for example, the equation (1A) Is represented as Here, f () represents an activation function.
z=f(u) ・・・(1A)
ただし、u=a+w+w+w+w ・・・(1B)
z = f (u) (1A)
However, u = a + w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 ··· (1B)
 式(1B)において、aは切片、w~wは各入力(x~x)に対応した重み等のパラメタを表す。 In the equation (1B), a represents an intercept, and w 1 to w 4 represent parameters such as weights corresponding to each input (x 1 to x 4 ).
 一方、図2(b)に示すように、2層に並べられた層間で各ユニットが結合されている場合、後段の層に着目すると、当該層内の各ユニットへの入力(それぞれx~x)に対する当該各ユニットの出力(z~z)は、例えば、次のように表される。なお、iは同一層内のユニットの識別子(本例ではi=1~3)である。 On the other hand, as shown in FIG. 2B, when the units are connected between the layers arranged in two layers, the input to each unit in the layer (x 1 to x 1) is focused on the subsequent layer. The output (z 1 to z 4 ) of each unit with respect to x 4 ) is expressed, for example, as follows. Note that i is an identifier of a unit in the same layer (i = 1 to 3 in this example).
=f(u) ・・・(2A)
ただし、u=a+wi,1+wi,2+wi,3+wi,4 ・・・(2B)
z i = f (u i ) (2A)
However, u i = a + w i , 1 x 1 + w i, 2 x 2 + w i, 3 x 3 + w i, 4 x 4 ··· (2B)
 以下では、式(2B)を単純化して、z=Σwi,k*xと記す場合がある。なお、切片aは省略した。なお、切片aを値1の定数項の係数(パラメタの1つ)とみなすことも可能である。ここで、kは当該層における各ユニットへの入力、より具体的にはその入力を行う他のユニットの識別子を表す。このとき、当該層における各ユニットへの入力が前段の層の各ユニットの出力のみである場合には、上述の簡略式を、u (L)=Σwi,k (L)*z (L-1)と記すことも可能である。なお、Lは層の識別子を表す。これらの式において、wi,kが、当該層(第L層)における各ユニットiのパラメタ、より具体的には、各ユニットiと他のユニットkとの結合(ユニット間結合)の重みに相当する。以下では、ユニットを特に区別せず、ユニットの出力値を決める関数(活性化関数)を簡略化して、z=Σw*xと記す場合がある。 In the following, Expression (2B) may be simplified and written as z i = Σwi , k * x k . The section a is omitted. In addition, the intercept a can be regarded as a coefficient (one of parameters) of a constant term having a value of 1. Here, k represents an input to each unit in the layer, more specifically, an identifier of another unit performing the input. At this time, if the input to each unit in the layer is only the output of each unit of the preceding layer, the simplified equation above, u i (L) = Σw i, k (L) * z k ( It is also possible to write L-1) . Note that L represents a layer identifier. In these equations, w i, k is a parameter of each unit i in the layer (the L-th layer), more specifically, a weight of a bond (inter-unit bond) between each unit i and another unit k. Equivalent to. In the following, there is a case where a function (activation function) for determining an output value of a unit is simplified and z = Σw * x without distinguishing the unit.
 上記の例において、あるユニットについて入力xから出力zを求める計算が、当該ユニットにおける推論処理に相当する。このとき、パラメタwは固定される。一方、あるユニットについてパラメタwを求める計算が当該ユニットにおけるパラメタ更新処理に相当する。 In the above example, the calculation for obtaining the output z from the input x for a certain unit corresponds to the inference processing in the unit. At this time, the parameter w is fixed. On the other hand, the calculation for obtaining the parameter w for a certain unit corresponds to a parameter updating process in the unit.
実施形態1.
 図3は、第1の実施形態の学習装置の構成例を示すブロック図である。図3に示す学習装置100は、学習前モデル記憶部101と、学習用データ記憶部102と、学習処理部106と、学習後モデル記憶部107とを備える。
Embodiment 1 FIG.
FIG. 3 is a block diagram illustrating a configuration example of the learning device according to the first embodiment. The learning device 100 illustrated in FIG. 3 includes a pre-learning model storage unit 101, a learning data storage unit 102, a learning processing unit 106, and a post-learning model storage unit 107.
 学習前モデル記憶部101は、学習前のモデルの情報を記憶する。学習前のモデルの情報には、パラメタの初期値が含まれていてもよい。 前 The pre-learning model storage unit 101 stores information on the model before learning. The information of the model before learning may include an initial value of the parameter.
 学習用データ記憶部102は、モデルの学習に用いるデータである学習用データを記憶する。なお、学習用データの形式は特に問わない。 The learning data storage unit 102 stores learning data that is data used for learning a model. The format of the learning data is not particularly limited.
 学習処理部106は、学習用データ記憶部102に記憶された学習用データを用いて、学習前モデル記憶部101に記憶されているモデルの学習を行う。 The learning processing unit 106 performs learning of the model stored in the pre-learning model storage unit 101 using the learning data stored in the learning data storage unit 102.
 本実施形態の学習処理部106は、少なくとも高効率推論処理部103aと高精度パラメタ更新処理部104bと制御部105とを含む。なお、学習処理部106は、図3に示すように、さらに高精度推論処理部103bと高効率パラメタ更新処理部104aを含んでいてもよい。 The learning processing unit 106 of the present embodiment includes at least the high-efficiency inference processing unit 103a, the high-precision parameter update processing unit 104b, and the control unit 105. The learning processing unit 106 may further include a high-precision inference processing unit 103b and a high-efficiency parameter update processing unit 104a, as shown in FIG.
 高効率推論処理部103aは、指定された層またはユニットを対象とする推論処理を、第1の演算精度で行う。 (4) The high-efficiency inference processing unit 103a performs inference processing for a specified layer or unit with a first calculation accuracy.
 高精度パラメタ更新処理部104bは、指定された層、ユニットまたはパラメタを対象とするパラメタ更新処理を、第1の演算精度よりも高い演算精度の第2の演算精度で行う。 The high-precision parameter update processing unit 104b performs a parameter update process for a specified layer, unit, or parameter with a second operation accuracy higher than the first operation accuracy.
 制御部105は、学習処理を実施する各処理部(本例では、高効率推論処理部103a、高精度推論処理部103b、高効率パラメタ更新処理部104aおよび高精度パラメタ更新処理部104b)を制御して、必要な学習処理を実施させる。制御部105は、より具体的には、学習前のモデルおよび学習用データの読み込み、学習処理を実施する各処理部へ演算の指示を行うことによる学習処理にかかる演算精度の切替制御を行う。演算の指示には、演算対象とするユニットの指定や演算に必要なパラメータの入力が含まれる。 The control unit 105 controls each processing unit (in this example, the high-efficiency inference processing unit 103a, the high-accuracy inference processing unit 103b, the high-efficiency parameter update processing unit 104a, and the high-precision parameter update processing unit 104b) that performs the learning process. Then, necessary learning processing is performed. More specifically, the control unit 105 reads the model and the learning data before learning, and controls the switching of the calculation accuracy for the learning process by giving a calculation instruction to each processing unit that performs the learning process. The calculation instruction includes designation of a unit to be calculated and input of parameters necessary for the calculation.
 学習後モデル記憶部107は、学習後のモデルの情報を記憶する。学習後のモデルの情報には、各ユニットの更新後のパラメタの値が含まれていてもよい。 後 The post-learning model storage unit 107 stores information on the model after learning. The information on the model after learning may include the updated parameter values of each unit.
 また、図4は、学習処理部106のハードウエア構成の例を示す構成図である。図4に示すように、学習処理部106は、低精度演算回路11と、高精度演算回路12と、メモリ13と、制御装置14とがそれぞれバス15を介して接続された演算処理装置等により実現されてもよい。なお、高精度演算回路12は、低精度演算回路11よりも高い演算精度で演算が可能な回路であればよい。 FIG. 4 is a configuration diagram showing an example of a hardware configuration of the learning processing unit 106. As shown in FIG. 4, the learning processing unit 106 is configured by an arithmetic processing device or the like in which the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12, the memory 13, and the control device 14 are connected via the bus 15. It may be realized. Note that the high-precision operation circuit 12 may be any circuit that can perform an operation with higher operation accuracy than the low-accuracy operation circuit 11.
 その場合において、高効率推論処理部103aおよび高効率パラメタ更新処理部104aは、例えば、低精度演算回路11により実現されてもよい。また、高精度推論処理部103bおよび高精度パラメタ更新処理部104bは、例えば、高精度演算回路12により実現されてもよい。また、制御部105は、例えば、制御装置14により実現されてもよい。 In that case, the high-efficiency inference processing unit 103a and the high-efficiency parameter update processing unit 104a may be realized by, for example, the low-precision arithmetic circuit 11. The high-precision inference processing unit 103b and the high-precision parameter update processing unit 104b may be realized by, for example, the high-precision arithmetic circuit 12. Further, the control unit 105 may be realized by, for example, the control device 14.
 本例において、低精度演算回路11と高精度演算回路12はそれぞれバス15を介して接続されており、バス15を介してお互いの演算結果を通知するなどのデータのやり取りを行うことができる。なお、バス15にはさらにメモリ13が接続されていてもよく、その場合、低精度演算回路11と高精度演算回路12がそれぞれメモリ13を介してデータのやりとりを行うことも可能である。その場合、メモリ13は通信路の一部として扱われる。なお、メモリ13は、On-chip memoryとして、低精度演算回路11および高精度演算回路12と同一のチップ上に実装されてもよい。すなわち、低精度演算回路11、高精度演算回路12およびメモリ13が、チップ内で内部接続されていてもよい。また、メモリ13は、Off-chip memoryとして、低精度演算回路11や高精度演算回路12と同一のチップ上に実装されなくてもよい。すなわち、外部メモリインタフェースを介して外部接続されていてもよい。 In this example, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 are connected via a bus 15, respectively, and can exchange data such as notifying each other of the arithmetic results via the bus 15. Note that a memory 13 may be further connected to the bus 15. In this case, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 can also exchange data via the memory 13. In that case, the memory 13 is treated as a part of the communication path. The memory 13 may be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as On-chip @ memory. That is, the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12, and the memory 13 may be internally connected in the chip. Also, the memory 13 does not have to be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as Off-chip @ memory. That is, it may be externally connected via an external memory interface.
 本実施形態では、学習処理(特に、推論処理およびパラメタ更新処理)を実施する処理部が実際に演算に用いる数字データの値域の広さ・細かさの尺度(より具体的には、その処理部を実現する演算回路におけるビット幅および小数点の取り扱い等で定まる数字データの値域の広さおよび細かさの尺度)を、「精度」または「演算精度」と呼ぶ。低精度演算回路11における演算精度である低演算精度と高精度演算回路12における演算精度である高演算精度の組み合わせの例としては、例えば、図5に示すような組み合わせが挙げられる。図5は、低精度演算回路11における演算精度である低演算精度と高精度演算回路12における演算精度である高演算精度の組み合わせの例を示す説明図である。 In the present embodiment, the processing unit that performs the learning process (particularly, the inference process and the parameter update process) measures the width and fineness of the range of the numerical data actually used for the calculation (more specifically, the processing unit). (A measure of the breadth and fineness of the range of numeric data determined by the handling of the bit width and the decimal point, etc.) in the arithmetic circuit that implements the above is referred to as “precision” or “operation accuracy”. As an example of a combination of the low calculation accuracy, which is the calculation accuracy in the low-precision calculation circuit 11, and the high calculation accuracy, which is the calculation accuracy in the high-precision calculation circuit 12, there is, for example, a combination as shown in FIG. FIG. 5 is an explanatory diagram showing an example of a combination of low operation accuracy, which is the operation accuracy of the low accuracy operation circuit 11, and high operation accuracy, which is the operation accuracy of the high accuracy operation circuit 12.
 なお、低精度演算回路11における演算精度と高精度演算回路12における演算精度の組み合わせは、図5に示すものに限定されない。例えば、低精度演算回路11における演算精度(低演算精度)を、固定少数点の{1,2,8,16}ビットのいずれかまたは整数の{1,2,8,16}ビットのいずれかとし、高精度演算回路12における演算精度(高演算精度)を、固定小数点の{2,8,16,32}ビット、浮動小数点の{9,16,32}ビットのいずれかまたはpower of 2の浮動小数点の{8,16,24,32}ビットのいずれかとしてもよい。ただし、高演算精度は、低演算精度に比べて、高精度(例えば、数字データの値域がより広い、数値データの値域がより細かいなど、表現可能な有効桁数がより大きい)であるものとする。 The combination of the calculation accuracy in the low-precision calculation circuit 11 and the calculation accuracy in the high-precision calculation circuit 12 is not limited to that shown in FIG. For example, the calculation accuracy (low calculation accuracy) in the low-precision calculation circuit 11 is defined as any one of {1, 2, 8, 16} bits of a fixed decimal point or any of {1, 2, 8, 16} bits of an integer. The calculation accuracy (high calculation accuracy) in the high-precision calculation circuit 12 is either fixed-point {2,8,16,32} bits, floating-point {9,16,32} bits or power {of} 2. Any of {8, 16, 24, 32} bits of floating point may be used. However, the high calculation accuracy is higher than the low calculation accuracy (for example, the range of numerical data is wider, the range of numerical data is finer, and the number of significant digits that can be expressed is larger). I do.
 また、図6は、学習装置100にかかるコンピュータの構成例を示す概略ブロック図である。コンピュータ1000は、プロセッサ1008と、主記憶装置1002と、補助記憶装置1003と、インタフェース1004と、ディスプレイ装置1005と、入力デバイス1006とを備える。また、プロセッサ1008は、CPU1001や、GPU1007などの各種演算・処理装置を含んでいてもよい。 FIG. 6 is a schematic block diagram illustrating a configuration example of a computer according to the learning device 100. The computer 1000 includes a processor 1008, a main storage device 1002, an auxiliary storage device 1003, an interface 1004, a display device 1005, and an input device 1006. Further, the processor 1008 may include various arithmetic and processing devices such as the CPU 1001 and the GPU 1007.
 学習装置100は、例えば、図6に示すようなコンピュータ1000に実装されてもよい。その場合、学習装置100(特に、制御部105)の動作は、プログラムの形式で補助記憶装置1003に記憶されていてもよい。CPU1001は、プログラムを補助記憶装置1003から読み出して主記憶装置1002に展開し、そのプログラムに従って学習装置100における所定の処理を実施する。なお、CPU1001は、プログラムに従って動作する情報処理装置の一例であり、コンピュータ1000は、CPU(Central Processing Unit)以外にも、例えば、MPU(Micro Processing Unit)やMCU(Memory Control Unit)やGPU(Graphics Processing Unit)を備えていてもよい。 The learning device 100 may be implemented in, for example, a computer 1000 as shown in FIG. In this case, the operation of the learning device 100 (in particular, the control unit 105) may be stored in the auxiliary storage device 1003 in the form of a program. The CPU 1001 reads out a program from the auxiliary storage device 1003, expands the program in the main storage device 1002, and performs a predetermined process in the learning device 100 according to the program. Note that the CPU 1001 is an example of an information processing device that operates according to a program, and the computer 1000 is not limited to a CPU (Central Processing Unit), but may be an MPU (Micro Processing Unit), an MCU (Memory Control Unit), or a GPU (Graphics). Processing Unit).
 図6では、コンピュータ1000が、CPU1001に加えて、上記の低精度演算回路11および高精度演算回路12を実装するGPU1007をさらに備える例が示されているが、低精度演算回路11および高精度演算回路12が他のプロセッサや演算装置(後述するMAC(multiplier-accumulator)や乗算器ツリーやALU(Arthmetic Logic Unit)アレイ等)により実装される場合は本例の限りではなく、当該他のプロセッサや演算装置を備えていればよい。また、低精度演算回路11および高精度演算回路12は異なるチップに実装されてもよく、具体的なチップ構成は特に限定されない。 FIG. 6 shows an example in which the computer 1000 further includes a GPU 1007 in which the above low-precision arithmetic circuit 11 and high-precision arithmetic circuit 12 are mounted in addition to the CPU 1001. The case where the circuit 12 is implemented by another processor or an arithmetic unit (such as a MAC (multiplier-accumulator), a multiplier tree, or an ALU (Arthmetic Logic Unit) array, which will be described later) is not limited to this example. What is necessary is just to have an arithmetic unit. Further, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 may be mounted on different chips, and a specific chip configuration is not particularly limited.
 補助記憶装置1003は、一時的でない有形の媒体の一例である。一時的でない有形の媒体の他の例として、インタフェース1004を介して接続される磁気ディスク、光磁気ディスク、CD-ROM、DVD-ROM、半導体メモリ等が挙げられる。また、このプログラムが通信回線によってコンピュータ1000に配信される場合、配信を受けたコンピュータは1000がそのプログラムを主記憶装置1002に展開し、学習装置100における所定の処理を実行してもよい。 The auxiliary storage device 1003 is an example of a non-transitory tangible medium. Other examples of non-transitory tangible media include a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, and a semiconductor memory connected via the interface 1004. When the program is distributed to the computer 1000 via a communication line, the computer that has received the distribution may load the program into the main storage device 1002 and execute a predetermined process in the learning device 100.
 また、プログラムは、学習装置100における所定の処理の一部を実現するためのものであってもよい。さらに、プログラムは、補助記憶装置1003に既に記憶されている他のプログラムとの組み合わせで学習装置100における所定の処理を実現する差分プログラムであってもよい。 The program may be for realizing a part of a predetermined process in the learning device 100. Further, the program may be a difference program that realizes a predetermined process in the learning device 100 in combination with another program already stored in the auxiliary storage device 1003.
 インタフェース1004は、他の装置との間で情報の送受信を行う。また、ディスプレイ装置1005は、ユーザに情報を提示する。また、入力デバイス1006は、ユーザからの情報の入力を受け付ける。 The interface 1004 transmits and receives information to and from another device. The display device 1005 presents information to the user. Further, the input device 1006 receives input of information from a user.
 また、学習装置100における処理内容によっては、コンピュータ1000の一部の要素は省略可能である。例えば、コンピュータ1000がユーザに情報を提示しないのであれば、ディスプレイ装置1005は省略可能である。例えば、コンピュータ1000がユーザから情報入力を受け付けないのであれば、入力デバイス1006は省略可能である。 Also, some elements of the computer 1000 can be omitted depending on the processing content of the learning device 100. For example, if the computer 1000 does not present information to the user, the display device 1005 can be omitted. For example, if the computer 1000 does not accept information input from a user, the input device 1006 can be omitted.
 また、上記の各構成要素の一部または全部は、汎用または専用の回路(Circuitry)、プロセッサ等やこれらの組み合わせによって実施される。これらは単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。また、上記の各構成要素の一部又は全部は、上述した回路等とプログラムとの組み合わせによって実現されてもよい。 Part or all of the above components are implemented by a general-purpose or dedicated circuit (Circuitry), a processor, or a combination thereof. These may be constituted by a single chip, or may be constituted by a plurality of chips connected via a bus. In addition, some or all of the above-described components may be realized by a combination of the above-described circuit and the like and a program.
 上記の各構成要素の一部又は全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントアンドサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 When some or all of the above-described components are realized by a plurality of information processing devices or circuits, the plurality of information processing devices or circuits may be centrally arranged or distributed. Good. For example, the information processing device, the circuit, and the like may be implemented as a form in which each is connected via a communication network, such as a client and server system or a cloud computing system.
[回路構成]
 次に、少なくとも高効率推論処理部103aの実装例とされる推論回路の構成をいくつか例示する。高効率推論処理部103aは、例えば、指定された層の各ユニットまたは指定されたユニットについて、当該ユニットへの入力を受け付けると、当該ユニットの出力を計算する推論処理を所定の低演算精度で行い、計算結果を出力してもよい。そのとき、高効率推論処理部103aは、ユニットの出力を計算する際に用いる入力の値および他の変数(重みや切片等のパラメタ)の値を入力として受け付けて、上記の処理を行ってもよい。以下、推論処理で行われる演算を推論演算という場合がある。
[Circuit configuration]
Next, some examples of the configuration of an inference circuit that is an implementation example of at least the high-efficiency inference processing unit 103a will be described. The high-efficiency inference processing unit 103a performs, for example, for each unit in the specified layer or the specified unit, when receiving an input to the unit, performs an inference process of calculating the output of the unit with a predetermined low calculation accuracy. , A calculation result may be output. At this time, the high-efficiency inference processing unit 103a may receive the values of the inputs and the values of other variables (parameters such as weights and intercepts) used for calculating the output of the unit as inputs and perform the above processing. Good. Hereinafter, the operation performed in the inference processing may be referred to as an inference operation.
 以下では、推論演算を行うための回路を「推論回路」と呼び、特に、高精度パラメタ更新処理部104bが行うパラメタ更新演算の演算精度よりも低い演算精度で推論演算を行うための回路を「高効率推論回路」と呼ぶ。このようにして、推論回路の演算精度をできるだけ低く、少なくとも高精度パラメタ更新処理部104bが行うパラメタ更新演算の演算精度よりも低くする(例えば、ビット幅を32ビットから16ビットにする、浮動小数点演算を固定少数点演算にする等)ことで、消費電力を低減する。なお、高効率推論回路と区別するために、高精度パラメタ更新処理部104bが行うパラメタ更新演算の演算精度と同じ演算精度で推論演算を行うための回路を「高精度推論回路」と呼ぶ場合がある。上述した高精度推論処理部(図示せず)は、そのような高精度推論回路により実現されてもよい。 Hereinafter, a circuit for performing an inference operation is referred to as an “inference circuit”, and in particular, a circuit for performing an inference operation with lower operation accuracy than the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b is referred to as “inference circuit”. High efficiency inference circuit. " In this manner, the operation accuracy of the inference circuit is made as low as possible, and at least lower than the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b (for example, the bit width is changed from 32 bits to 16 bits, floating point The operation is a fixed-point operation, for example) to reduce power consumption. In order to distinguish the circuit from the high-efficiency inference circuit, a circuit for performing an inference operation with the same operation accuracy as the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b may be referred to as a “high-accuracy inference circuit”. is there. The above-described high-precision inference processing unit (not shown) may be realized by such a high-precision inference circuit.
 以下に示す推論回路の構成は、推論演算が高精度で行われるか低精度で行われる回路かを問わず実現可能である。すなわち、高効率推論処理部103aと高精度推論処理部103bの違いが、当該処理部の動作を実装した演算回路において演算に用いる各変数、加算器、乗算器の精度のみであってもよい。 The configuration of the inference circuit described below can be realized regardless of whether the inference operation is performed with high accuracy or with low accuracy. That is, the difference between the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b may be only the accuracy of each variable, an adder, and a multiplier used for the operation in the arithmetic circuit in which the operation of the processing unit is implemented.
 推論回路の最も単純な例は、乗算器と加算器を組み合わせた乗加算器(MAC)221を1つ備えた構成である(図7(a)の演算回路22a参照)。なお、符号21はバスを表している。 The simplest example of the inference circuit has a configuration in which one multiplier-adder (MAC) 221 in which a multiplier and an adder are combined is provided (see the arithmetic circuit 22a in FIG. 7A). Reference numeral 21 represents a bus.
 MAC221は、乗算器と、加算器と、3つの入力を保持する記憶素子と、1つの出力を保持する記憶素子とを含んでいてもよい(図7(b)参照)。図7(b)に示すMAC221は、3つの変数a,w,xを受け付けると、1つの出力変数z=a+w*xを計算する演算回路の例である。本例において、zがユニットの出力、a、wがパラメタ(推論処理では固定)、xがユニットの入力に相当する。このような構成において、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い(浮動小数点か固定小数点か等)により決定される。例えば、高効率推論処理部103aが演算回路22aにより実現される場合、当該回路が含むMAC221における各変数(a,w,x,z)、加算器および乗算器による演算が低演算精度(第1の演算精度)に対応していればよい。このとき、当該回路における各変数、加算および乗算のすべてが同じ精度である必要はない(以下、同様)。例えば、各変数、加算および乗算のいずれかで用いられる精度が、高精度パラメタ更新処理部104bを実現する演算回路の各変数、加算および乗算のいずれかで用いられる精度よりも低ければよい。 The MAC 221 may include a multiplier, an adder, a storage element holding three inputs, and a storage element holding one output (see FIG. 7B). The MAC 221 illustrated in FIG. 7B is an example of an arithmetic circuit that calculates one output variable z = a + w * x when receiving three variables a, w, and x. In this example, z corresponds to the output of the unit, a and w correspond to parameters (fixed in the inference processing), and x corresponds to the input of the unit. In such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.). For example, when the high-efficiency inference processing unit 103a is implemented by the arithmetic circuit 22a, the arithmetic by the variables (a, w, x, z), the adder, and the multiplier in the MAC 221 included in the arithmetic circuit 22a has low arithmetic accuracy (first arithmetic operation). ). At this time, it is not necessary that all of the variables, addition, and multiplication in the circuit have the same precision (the same applies hereinafter). For example, it is only necessary that the precision used in each of the variables, addition and multiplication be lower than the precision used in each of the variables, addition and multiplication of the arithmetic circuit that implements the high-precision parameter update processing unit 104b.
 図8~10は、推論演算用の演算回路(推論回路)の他の例を示す概略構成図である。推論回路は、例えば、図8に示す演算回路22bのように、複数のMAC221を並列に接続した構成(いわゆるGPUの構成)であってもよい。このような構成においても、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い(浮動小数点か固定小数点か等)により決定される。 FIGS. 8 to 10 are schematic configuration diagrams showing another example of an operation circuit (inference circuit) for inference operation. The inference circuit may have a configuration in which a plurality of MACs 221 are connected in parallel (a configuration of a GPU), for example, as in an arithmetic circuit 22b illustrated in FIG. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
 また、推論回路は、例えば図9に示す演算回路22cのように、メモリ層222を介して複数の乗加算ツリー223が並列に接続された構成であってもよい。図9に示す乗加算ツリー223は、4つの乗算器と2つの加算器と1つの加算器がツリー状に接続された構成の回路である。なお、図9に示す演算回路22cの一例は、非特許文献3にも開示されている。このような構成においても、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い(浮動小数点か固定小数点か等)により決定される。 The inference circuit may have a configuration in which a plurality of multiply-addition trees 223 are connected in parallel via a memory layer 222, for example, as in an arithmetic circuit 22c shown in FIG. The multiply-add tree 223 shown in FIG. 9 is a circuit having a configuration in which four multipliers, two adders, and one adder are connected in a tree shape. Note that an example of the arithmetic circuit 22c shown in FIG. 9 is also disclosed in Non-Patent Document 3. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
 また、推論回路は、例えば図10に示す演算回路22dのように、メモリ層222を介して複数のALU224がアレイ状に接続された構成(シストリックアレイ構成)であってもよい。なお、図10に示す演算回路22dの一例は、非特許文献1にも開示されている。このような構成においても、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い(浮動小数点か固定小数点か等)により決定される。 The inference circuit may have a configuration in which a plurality of ALUs 224 are connected in an array via the memory layer 222 (systolic array configuration), for example, as in an arithmetic circuit 22d shown in FIG. An example of the arithmetic circuit 22d shown in FIG. 10 is also disclosed in Non-Patent Document 1. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).
 なお、例えば、高効率推論処理部103aが図8~図10に示した演算回路22b、演算回路22cまたは演算回路22dにより実現される場合、当該回路において演算に用いられる各変数、加算器または乗算器による演算が低演算精度(第1の演算精度)に対応していればよい。 For example, when the high-efficiency inference processing unit 103a is realized by the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d shown in FIGS. It is only necessary that the calculation by the calculator corresponds to the low calculation accuracy (first calculation accuracy).
 一方、例えば、高精度推論処理部103bが演算回路22a,演算回路22b、演算回路22cまたは演算回路22dにより実現される場合、当該回路において演算に用いられる各変数、加算器または乗算器による演算が高演算精度(第2の演算精度)に対応していればよい。 On the other hand, for example, when the high-precision inference processing unit 103b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, the calculation by each variable, the adder, or the multiplier used for the arithmetic in the circuit is performed. It is only necessary to correspond to high calculation accuracy (second calculation accuracy).
 次に、少なくとも高精度パラメタ更新処理部104bの実装例とされるパラメタ更新用回路の構成をいくつか例示する。高精度パラメタ更新処理部104bは、例えば、指定された層の各ユニットにおける各パラメタ、指定されたユニットにおける各パラメタまたは指定されたパラメタについて、当該パラメタを調整パラメータとして含む誤差関数などの目的関数の最適化問題を解いて該調整パラメータを更新するパラメタ更新処理を所定の高演算精度で行い、更新後の値を出力してもよい。そのとき、高精度パラメタ更新処理部104bは、最適化問題を解く際に用いる変数の値(更新前のパラメタの値を含みうる)をパラメータとして受け付けて、上記の処理を行ってもよい。以下、パラメタ更新処理で行われる演算をパラメタ更新演算という場合がある。 Next, some examples of the configuration of a parameter updating circuit which is an implementation example of at least the high-precision parameter updating processing unit 104b will be described. The high-precision parameter update processing unit 104b, for example, for each parameter in each unit of the specified layer, for each parameter in the specified unit or the specified parameter, sets an objective function such as an error function that includes the parameter as an adjustment parameter. A parameter updating process for solving the optimization problem and updating the adjustment parameter may be performed with a predetermined high calculation accuracy, and the updated value may be output. At that time, the high-precision parameter update processing unit 104b may receive the value of the variable (which may include the value of the parameter before updating) used in solving the optimization problem as a parameter, and perform the above processing. Hereinafter, the operation performed in the parameter update processing may be referred to as a parameter update operation.
 以下では、パラメタ更新演算を行うための回路を「パラメタ更新回路」と呼び、特に、高効率推論処理部103aが行う推論演算の演算精度よりも高い演算精度で思い学習演算を行うための回路を「高精度パラメタ更新回路」と呼ぶ。なお、高精度パラメタ更新回路と区別するために、高効率推論処理部103aが行う推論演算の演算精度と同じ演算精度でパラメタ更新演算を行うための回路を「高効率パラメタ更新回路」と呼ぶ場合がある。上述した高効率パラメタ更新処理部(図示せず)は、そのような高効率パラメタ更新回路により実現されてもよい。 Hereinafter, a circuit for performing the parameter update operation is referred to as a “parameter update circuit”, and in particular, a circuit for performing the thought learning operation with higher operation accuracy than the operation accuracy of the inference operation performed by the high-efficiency inference processing unit 103a. This is called a “high-precision parameter update circuit”. Note that, in order to distinguish from the high-precision parameter updating circuit, a circuit for performing a parameter updating operation with the same operation accuracy as the inference operation performed by the high-efficiency inference processing unit 103a is referred to as a “high-efficiency parameter updating circuit”. There is. The above-described high-efficiency parameter update processing unit (not shown) may be realized by such a high-efficiency parameter update circuit.
 以下に示すパラメタ更新回路の構成は、パラメタ更新演算が高精度で行われるか低精度で行われる回路かを問わず実現可能である。すなわち、高効率パラメタ更新処理部104aと高精度パラメタ更新処理部104bの違いが、当該処理部の動作を実装した演算回路において演算に用いる各変数、加算器または乗算器の精度のみであってもよい。 The configuration of the parameter updating circuit described below can be realized irrespective of whether the parameter updating operation is performed with high accuracy or with low accuracy. In other words, even if the difference between the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b is only the accuracy of each variable, adder, or multiplier used in the operation in the arithmetic circuit that implements the operation of the processing unit. Good.
 パラメタ更新回路の最も単純な例は、推論回路と同様、乗算器と加算器を組み合わせた乗加算器(MAC)221を1つ備えた構成である(図7(a)の演算回路22a,図7(b)のMAC221等参照)。また、パラメタ更新回路は、例えば、図8~10に示す演算回路22b、演算回路22c、演算回路22dによっても実現できる。すなわち、図7~図10に示す演算回路は、パラメタ更新演算用の演算回路の例でもある。 The simplest example of the parameter updating circuit has a configuration including one multiplier / adder (MAC) 221 in which a multiplier and an adder are combined similarly to the inference circuit (the arithmetic circuit 22a in FIG. 7 (b) MAC221 etc.). The parameter updating circuit can also be realized by, for example, the arithmetic circuits 22b, 22c, and 22d shown in FIGS. That is, the arithmetic circuits shown in FIGS. 7 to 10 are also examples of arithmetic circuits for parameter update arithmetic.
 例えば、高精度パラメタ更新処理部104bが演算回路22a,演算回路22b、演算回路22cまたは演算回路22dにより実現される場合、当該回路において演算に用いられる各変数、加算器および乗算器による演算が高演算精度(第2の演算精度)に対応していればよい。このとき、各変数、加算および乗算のすべてが同じ精度である必要はなく、当該回路においてパラメタ更新演算に用いられる各変数、加算および乗算のいずれかの精度が、高効率推論処理部103aを実現する演算回路において推論演算に用いられる各変数、加算および乗算のいずれかの精度よりも高ければよい。 For example, when the high-precision parameter update processing unit 104b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, the calculation by each variable, the adder, and the multiplier used in the arithmetic in the circuit is high. It is only necessary to correspond to the calculation accuracy (second calculation accuracy). At this time, it is not necessary that all the variables, addition and multiplication have the same precision, and the accuracy of each variable, addition and multiplication used for the parameter update operation in the circuit realizes the high-efficiency inference processing unit 103a. It is only required that the accuracy of each variable used in the inference operation, addition and multiplication in the arithmetic circuit be higher than that of any of the addition and the multiplication.
 一方、例えば、高効率パラメタ更新処理部104aが演算回路22a,演算回路22b、演算回路22cまたは演算回路22dにより実現される場合、当該回路において演算に用いられる各変数、加算器および乗算器による演算が低演算精度(第1の演算精度)に対応していればよい。 On the other hand, for example, when the high-efficiency parameter update processing unit 104a is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, each variable used for the arithmetic in the circuit, the arithmetic by the adder and the multiplier are used. Should correspond to low operation accuracy (first operation accuracy).
[動作]
 次に、本実施形態の学習装置100の動作を説明する。図11は、本実施形態の学習装置100の動作の例を示すフローチャートである。図11に示す動作は、例えば、制御部105による制御に基づいて実行される。
[motion]
Next, the operation of the learning device 100 of the present embodiment will be described. FIG. 11 is a flowchart illustrating an example of the operation of the learning device 100 according to the present embodiment. The operation illustrated in FIG. 11 is performed based on, for example, control by the control unit 105.
 図11に示す例では、まず、制御部105が、学習前モデル記憶部101から学習前モデルを読み出すとともに、学習用データ記憶部102から学習用データを読み出す(ステップS11)。 In the example shown in FIG. 11, first, the control unit 105 reads the pre-learning model from the pre-learning model storage unit 101 and also reads the learning data from the learning data storage unit 102 (step S11).
 次いで、制御部105は、必要に応じて高効率推論処理部103aおよび高精度推論処理部103bを制御して、第1層~第n層までの全ての層に含まれる各ユニットについて順に推論処理を実施する(ステップS12:順伝搬)。このとき、制御部105は、少なくとも一部のユニットの推論処理を高効率推論処理部103aに実施させる。なお、制御部105は、全てのユニットの推論処理を高効率推論処理部103aに実施させてもよいし、一部のユニットの推論処理を高効率推論処理部103aに実施させてもよい。順伝搬で、高効率推論処理部103aに一部のユニットの推論処理を実施させる場合、制御部105は、残りのユニットの推論処理を高精度推論処理部103bに実施させてもよい。 Next, the control unit 105 controls the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b as necessary to sequentially perform inference processing on each unit included in all of the first to nth layers. (Step S12: forward propagation). At this time, the control unit 105 causes the high-efficiency inference processing unit 103a to perform inference processing of at least some of the units. The control unit 105 may cause the high-efficiency inference processing unit 103a to perform inference processing for all units, or may cause the high-efficiency inference processing unit 103a to perform inference processing for some units. When causing the high-efficiency inference processing unit 103a to perform the inference processing of some units in the forward propagation, the control unit 105 may cause the high-precision inference processing unit 103b to perform the inference processing of the remaining units.
 高効率推論処理部103aおよび高精度推論処理部103bは、制御部105からの指示に応じて、指定された層またはユニットの推論処理を実施する。 (4) The high-efficiency inference processing unit 103a and the high-accuracy inference processing unit 103b execute inference processing for a specified layer or unit in accordance with an instruction from the control unit 105.
 次いで、制御部105は、必要に応じて高効率パラメタ更新処理部104aおよび高精度パラメタ更新処理部104bを制御して、各層のユニットの出力を計算するためのパラメタのうちの所定のパラメタについて、パラメタ更新処理を実施する(ステップS13:パラメタ更新処理)。このとき、制御部105は、少なくとも一部のパラメタについてパラメタ更新処理を高精度パラメタ更新処理部104bに実施させる。なお、制御部105は、全てのパラメタのパラメタ更新処理を高精度パラメタ更新処理部104bに実施させてもよいし、一部のパラメタのパラメタ更新処理を高精度パラメタ更新処理部104bに実施させてもよい。パラメタ更新処理で、高精度パラメタ更新処理部104bに一部のパラメタのパラメタ更新処理のみを実施させる場合、制御部105は、残りのパラメタの全てのパラメタ更新処理を高効率パラメタ更新処理部104aに実施させてもよいし、残りのパラメタの一部のパラメタ更新処理を高効率パラメタ更新処理部104aに実施させてもよい。なお、後者の場合、一部のパラメタについてはパラメタ更新処理自体が省略される。 Next, the control unit 105 controls the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b as necessary, and for a predetermined parameter among the parameters for calculating the output of the unit of each layer, A parameter update process is performed (step S13: parameter update process). At this time, the control unit 105 causes the high-precision parameter update processing unit 104b to perform a parameter update process on at least some of the parameters. The control unit 105 may cause the high-precision parameter update processing unit 104b to perform parameter update processing for all parameters, or may cause the high-precision parameter update processing unit 104b to perform parameter update processing for some parameters. Is also good. When causing the high-precision parameter update processing unit 104b to perform only the parameter update processing of some of the parameters in the parameter update processing, the control unit 105 causes the high-efficiency parameter update processing unit 104a to perform all the parameter update processing of the remaining parameters. The processing may be performed, or a part of the remaining parameters may be updated by the high-efficiency parameter update processing unit 104a. In the latter case, the parameter update processing itself is omitted for some parameters.
 高効率パラメタ更新処理部104aおよび高精度パラメタ更新処理部104bは、制御部105からの指示に応じて、指定されたパラメタのパラメタ更新処理を実施する。 (4) The high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b execute the parameter update processing of the designated parameter according to the instruction from the control unit 105.
 最後に、制御部105は、ステップS13で更新されたパラメタを含む学習済みモデルを学習後モデル記憶部107に記憶する(ステップS14)。 Finally, the control unit 105 stores the learned model including the parameter updated in step S13 in the learned model storage unit 107 (step S14).
 上記動作の他のバリエーションとして、例えば、複数の学習用データが保持されている場合には、学習用データの数分、ステップS11~ステップS14の動作を繰り返してもよい。なお、その場合、1つ前の学習用データに対する学習結果としての学習済みモデルが、次の学習用データに対する学習の学習前モデルとして使用される。 As another variation of the above operation, for example, when a plurality of pieces of learning data are held, the operations of steps S11 to S14 may be repeated for the number of pieces of learning data. In this case, the learned model as a learning result for the immediately preceding learning data is used as a pre-learning model of learning for the next learning data.
 また、例えば、複数の学習用データが保持されている場合において、学習用データの数分、ステップS12~ステップS13の動作を繰り返し行うことも可能である。 In addition, for example, when a plurality of pieces of learning data are held, the operations of steps S12 to S13 can be repeatedly performed for the number of pieces of learning data.
 さらには、学習用データの数に関わらず、同じ学習用データを使って上記のステップS11~ステップS14の繰り返し動作またはステップS12~ステップS14の繰り返し動作を、複数回繰り返すことも可能である(epoch処理)。 Further, regardless of the number of learning data, it is also possible to repeat the above-described operation of step S11 to step S14 or the operation of step S12 to step S14 a plurality of times using the same learning data (epoch). processing).
 また、ステップS12の順伝搬で、例えば、低演算精度で推論処理を行う範囲(低精度推論範囲)を、予め定めておくだけでなく、ユーザから指定できるようにしたり、学習用データ毎やepoch処理の繰り返し毎に変化させることも可能である。 Further, in the forward propagation in step S12, for example, a range (low-precision inference range) in which inference processing is performed with low calculation accuracy is not only determined in advance, but also can be specified by the user, or can be specified for each learning data or epoch. It can be changed every time the processing is repeated.
 また、ステップS13のパラメタ更新処理で、例えば、高演算精度でパラメタ更新処理を行う範囲(高精度パラメタ更新範囲)を全結合層のみに限定してもよい。また、例えば、高精度パラメタ更新範囲、低演算精度でパラメタ更新処理を行う範囲(低精度パラメタ更新範囲)、パラメタ更新処理を行わない範囲を、予め定めておくだけでなく、ユーザから指定できるようにしたり、処理の度(学習用データ毎やepoch処理の繰り返し毎)に変化させることも可能である。 In the parameter update processing in step S13, for example, the range in which the parameter update processing is performed with high calculation accuracy (high-precision parameter update range) may be limited to only the fully connected layer. In addition, for example, a high-precision parameter update range, a range in which parameter update processing is performed with low calculation accuracy (low-precision parameter update range), and a range in which parameter update processing is not performed can be specified in advance as well as specified by the user. Or it can be changed at each processing (each learning data or each repetition of the epoch processing).
 また、図12および図13は、本実施形態の学習装置100のより具体的な動作例を示すフローチャートである。なお、図12および図13に示す動作例は、学習装置100を構成するハードウエアに着目して各ステップの動作を例示した例である。なお、ハードウエア構成は図4に示す構成とした。 FIGS. 12 and 13 are flowcharts showing more specific operation examples of the learning device 100 of the present embodiment. The operation examples shown in FIGS. 12 and 13 are examples in which the operation of each step is illustrated by focusing on the hardware configuring the learning device 100. The hardware configuration was the configuration shown in FIG.
 図12に示す例では、まず高効率推論処理部103aとしての低精度演算回路11が、制御部105としての制御装置14からの指示に応じて、学習用データ・学習前モデルをメモリ13から読み出す(ステップS111)。 In the example shown in FIG. 12, first, the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103a reads the learning data and the pre-learning model from the memory 13 in response to an instruction from the control device 14 as the control unit 105. (Step S111).
 次いで、該低精度演算回路11が、順伝搬の一部(本例では第1層~第(k-1)層までの各層に含まれる各ユニットの出力を計算する推論演算)を低演算精度で実施する(ステップS112)。そして、低精度演算回路11は、ステップS112の演算結果(本例では、第k-1層の各ユニットからの出力)をメモリ13に保存する(ステップS113)。 Next, the low-precision arithmetic circuit 11 converts a part of forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the first to (k−1) th layers) with low arithmetic accuracy. (Step S112). Then, the low-precision arithmetic circuit 11 stores the arithmetic result of step S112 (in this example, the output from each unit of the (k-1) th layer) in the memory 13 (step S113).
 なお、本例では、学習前モデルは、入力層を第0層、出力層を第n層として、第0層から第n層までのn+1層の多層構造のニューラルネットワークであるとする。また、上記の第(k-1)層は、入力層(第0層)よりも後段でかつ出力層(第n層)よりも前段の中間層とする。すなわち、kは、0<k-1<nを満たす整数とする。 In this example, it is assumed that the pre-learning model is a neural network having a multilayer structure of n + 1 layers from the 0th layer to the nth layer, with the input layer being the 0th layer and the output layer being the nth layer. The (k-1) th layer is an intermediate layer that is downstream of the input layer (0th layer) and upstream of the output layer (nth layer). That is, k is an integer satisfying 0 <k-1 <n.
 次いで、高精度推論処理部103bとしての高精度演算回路12が、制御装置14の指示に応じて、ステップS113で保存された演算結果(第k-1層の各ユニットからの出力)を読み出す(ステップS211)。 Next, the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads the operation result (output from each unit of the (k-1) th layer) stored in step S113 according to the instruction of the control device 14 ( Step S211).
 そして、該高精度演算回路12は、順伝搬の続き(本例では、第k層~第n層までの各層に含まれる各ユニットの出力を計算する推論演算)を高演算精度で実施する(ステップS212)。 Then, the high-precision arithmetic circuit 12 performs the continuation of forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the k-th layer to the n-th layer) with high arithmetic accuracy ( Step S212).
 次いで、高精度パラメタ更新処理部104bとしての高精度演算回路12が、制御装置14の指示に応じて、一部の層(本例では第k層~第n層までの各層)に含まれる各ユニットにおけるパラメタ(他ユニットとの結合重み等)を更新するためのパラメタ更新演算を高演算精度で実施する(ステップS212)。そして、高精度演算回路12は、ステップS212の演算結果(本例では、第k層~第n層の各層に含まれる各ユニットにおける更新後のパラメタ)をメモリ13に保存する(ステップS213)。 Next, the high-precision arithmetic circuit 12 serving as the high-precision parameter update processing unit 104b is configured to include, in accordance with an instruction from the control device 14, each of the layers included in some of the layers (the k-th to n-th layers in this example). A parameter update operation for updating a parameter (such as a connection weight with another unit) in the unit is performed with high operation accuracy (step S212). Then, the high-precision arithmetic circuit 12 stores the arithmetic result of step S212 (in this example, updated parameters in each unit included in each of the k-th layer to the n-th layer) in the memory 13 (step S213).
 なお、ステップS213で演算結果として保存された更新後のパラメタが、上述した学習済みモデルに相当する。 The updated parameter stored as the calculation result in step S213 corresponds to the learned model described above.
 図12に示す例は、まず低精度演算回路11が、高効率推論処理部103aとして、一部の層について推論処理を行った上で、高精度演算回路12が、高精度パラメタ更新処理部104bとして、残りの層について推論処理とパラメタ更新処理とを行う動作例である。 In the example shown in FIG. 12, first, the low-precision arithmetic circuit 11 performs inference processing on some layers as the high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 executes the high-precision parameter update processing unit 104b. This is an operation example of performing inference processing and parameter update processing for the remaining layers.
 また、図13に示す例では、まず高効率推論処理部103aとしての低精度演算回路11が、制御部105としての制御装置14からの指示に応じて、学習用データ・学習前モデルをメモリ13から読み出す(ステップS121)。 In the example shown in FIG. 13, first, the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103 a stores the learning data and the pre-learning model in the memory 13 in accordance with an instruction from the control device 14 as the control unit 105. (Step S121).
 次いで、該低精度演算回路11が、順伝搬(本例では第1層~第n層までの各層に含まれる各ユニットの出力を計算する推論演算)を低演算精度で実施する(ステップS122)。そして、低精度演算回路11は、ステップS122の演算結果(本例では、出力層である第n層のユニットからの出力)をメモリ13に保存する(ステップS123)。 Next, the low-precision arithmetic circuit 11 performs forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the first to nth layers) with low arithmetic accuracy (step S122). . Then, the low-precision arithmetic circuit 11 stores the arithmetic result of step S122 (in this example, the output from the unit of the nth layer which is the output layer) in the memory 13 (step S123).
 なお、本例でも、学習前モデルは、入力層を第0層、出力層を第n層として、第0層から第n層までのn+1層の多層構造のニューラルネットワークであるとする。 Note that, also in this example, the pre-learning model is a neural network having a multilayer structure of (n + 1) th layers from the 0th layer to the nth layer, with the input layer being the 0th layer and the output layer being the nth layer.
 次いで、高精度推論処理部103bとしての高精度演算回路12が、制御装置14の指示に応じて、ステップS123で保存された演算結果(出力層である第n層のユニットからの出力)を読み出す(ステップS221)。 Next, the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads out the operation result (output from the unit of the n-th layer which is the output layer) stored in step S123 according to the instruction of the control device 14. (Step S221).
 次いで、該高精度演算回路12は、制御装置14の指示に応じて、一部の層(本例では第k層~第n層までの各層)に含まれる各ユニットにおけるパラメタ(他ユニットとの結合重み等)を更新するためのパラメタ更新演算を高演算精度で実施する(ステップS222)。そして、高精度演算回路12は、ステップS222の演算結果(本例では、第k層~第n層の各層に含まれる各ユニットにおける更新後のパラメタ)をメモリ13に保存する(ステップS223)。 Next, the high-precision arithmetic circuit 12 responds to an instruction from the control device 14 to set parameters (in other words, the k-th layer to the n-th layer) in each unit included in some layers (the k-th layer to the n-th layer). A parameter update operation for updating the connection weight) is performed with high calculation accuracy (step S222). Then, the high-precision arithmetic circuit 12 stores the arithmetic result of step S222 (in this example, updated parameters in each unit included in each of the k-th layer to the n-th layer) in the memory 13 (step S223).
 なお、ステップS223で演算結果として保存された更新後のパラメタが、上述した学習済みモデルに相当する。 The updated parameter stored as the calculation result in step S223 corresponds to the learned model described above.
 図13に示す例は、低精度演算回路11が、高効率推論処理部103aとして、全ての層について推論処理を行った上で、高精度演算回路12が、高精度パラメタ更新処理部104bとして、一部の層についてパラメタ更新処理を行う動作例である。 In the example shown in FIG. 13, the low-precision arithmetic circuit 11 performs inference processing on all layers as a high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 performs a high-precision parameter update processing unit 104b. This is an operation example of performing parameter update processing for some layers.
 なお、図12のステップS213や図13のステップS223の後に、さらに、低精度演算回路11が、高効率パラメタ更新処理部104aとして、図14に示すような動作を行うことも可能である。 Note that, after step S213 in FIG. 12 or step S223 in FIG. 13, the low-precision arithmetic circuit 11 may further perform the operation shown in FIG. 14 as the high-efficiency parameter update processing unit 104a.
 すなわち、低精度演算回路11が、高効率パラメタ更新処理部104aとして、メモリ13に保存されていた第k層~第n層の各層に含まれる各ユニットにおける更新後のパラメタを読み出す(ステップS231)。 That is, the low-precision arithmetic circuit 11 reads out updated parameters in the units included in the k-th layer to the n-th layer stored in the memory 13 as the high-efficiency parameter update processing unit 104a (step S231). .
 次いで、該低精度演算回路11が、残りの層(本例では、第1層~第(k-1)層までの各層)に含まれる各ユニットにおけるパラメタ(他ユニットとの結合重み等)を更新するためのパラメタ更新演算を低演算精度で実施する(ステップS232)。そして、低精度演算回路11は、ステップS232の演算結果(本例では、第1層~第(k-1)層の各層に含まれる各ユニットにおける更新後のパラメタ)をメモリ13に保存する(ステップS233)。 Next, the low-precision arithmetic circuit 11 calculates parameters (such as connection weights with other units) in each unit included in the remaining layers (in this example, the first to (k-1) th layers). A parameter update operation for updating is performed with low operation accuracy (step S232). Then, the low-precision arithmetic circuit 11 saves the arithmetic result of step S232 (in this example, updated parameters in each unit included in each of the first to (k-1) th layers) in the memory 13 ( Step S233).
 本例の場合、ステップS213またはステップS223で演算結果として保存された更新後のパラメタとステップS233で演算結果として保存された更新後のパラメタとが、上述した学習済みモデルに相当する。 In the case of this example, the updated parameters stored as the calculation results in step S213 or S223 and the updated parameters stored as the calculation results in step S233 correspond to the learned model described above.
 なお、図12~図14に示す動作は、1つの学習用データに対する学習処理の例である。したがって、複数の学習用データが保持されている場合には、学習用データの数分、上記動作や上記動作に含まれる各演算ステップを繰り返すことも可能である。また、学習用データの数に関わらず、同じ学習用データを使って上記動作または上記動作に含まれる各演算ステップを、複数回繰り返すことも可能である(epoch処理)。また、上記動作において高精度パラメタ更新範囲とされる第k層~第n層は全結合層であってもよいし、kをユーザが指定したり、処理の度に変化させることも可能である。 The operations shown in FIGS. 12 to 14 are examples of learning processing for one learning data. Therefore, when a plurality of pieces of learning data are held, it is possible to repeat the above-described operation and the respective operation steps included in the above-described operations for the number of pieces of learning data. Also, regardless of the number of learning data, it is also possible to repeat the above operation or each operation step included in the above operation a plurality of times using the same learning data (epoch process). Further, the k-th layer to the n-th layer, which are the high-precision parameter update ranges in the above operation, may be fully connected layers, or k may be specified by the user or changed every time processing is performed. .
 以上のように、本実施形態によれば、学習アルゴリズムの演算処理を、推論処理とパラメタ更新処理とに分け、推論処理の少なくとも一部を低演算精度で演算し、かつパラメタ更新処理の少なくとも一部を高演算精度で演算することで、高演算精度を必要とする演算部分を最適化できるので、消費電力を低減しつつ十分な精度での学習が可能になる。 As described above, according to the present embodiment, the calculation processing of the learning algorithm is divided into inference processing and parameter update processing, at least a part of the inference processing is calculated with low calculation accuracy, and at least one of the parameter update processing is performed. By operating the unit with high operation accuracy, an operation part requiring high operation accuracy can be optimized, so that it is possible to perform learning with sufficient accuracy while reducing power consumption.
実施形態2.
 次に、本発明の第2の実施形態を説明する。図15は、第2の実施形態のデータ処理装置の要部の構成例を示すブロック図である。図15に示すデータ処理装置300は、低精度演算処理部31と、高精度演算処理部32と、通信路33と、データ変換部34とを備える。
Embodiment 2. FIG.
Next, a second embodiment of the present invention will be described. FIG. 15 is a block diagram illustrating a configuration example of a main part of the data processing device according to the second embodiment. The data processing device 300 illustrated in FIG. 15 includes a low-precision arithmetic processing unit 31, a high-precision arithmetic processing unit 32, a communication path 33, and a data conversion unit 34.
 低精度演算処理部31は、相対的に低い演算精度で所定の演算を行う処理部である。ここで、相対的に低い演算精度とは、高精度演算処理部32が行う演算の演算精度よりも低い演算精度であればよい。 The low-precision calculation processing unit 31 is a processing unit that performs a predetermined calculation with relatively low calculation accuracy. Here, the relatively low calculation accuracy may be any calculation accuracy that is lower than the calculation accuracy of the calculation performed by the high-precision calculation processing unit 32.
 高精度演算処理部32は、相対的に高い演算精度で所定の演算を行う処理部である。ここで、相対的に高い演算精度とは、低精度演算処理部31が行う演算の演算精度よりも高い演算精度であればよい。 The high-precision operation processing unit 32 is a processing unit that performs a predetermined operation with relatively high operation accuracy. Here, the relatively high calculation accuracy may be any calculation accuracy that is higher than the calculation accuracy of the calculation performed by the low-precision calculation processing unit 31.
 低精度演算処理部31は、例えば、上記の高効率推論処理部103aや高効率パラメタ更新処理部104aであってもよい。また、高精度演算処理部32は、例えば、上記の高精度推論処理部103bや高精度パラメタ更新処理部104bであってもよい。本実施形態においても、低精度演算処理部31および高精度演算処理部32が行うデータ処理において実際に行われる演算に用いる数字データの値域の広さ・細かさの尺度(より具体的には、その処理部を実現する演算回路におけるビット幅および小数点の取り扱い等で定まる数字データの値域の広さ・細かさの尺度)を、「精度」または「演算精度」と呼ぶ。 The low-precision arithmetic processing unit 31 may be, for example, the high-efficiency inference processing unit 103a or the high-efficiency parameter update processing unit 104a. The high-precision arithmetic processing unit 32 may be, for example, the high-precision inference processing unit 103b or the high-precision parameter update processing unit 104b. Also in the present embodiment, a measure of the width and fineness of the range of numerical data used for the operation actually performed in the data processing performed by the low-precision operation processing unit 31 and the high-precision operation processing unit 32 (more specifically, The scale of the range or fineness of the range of numeric data determined by the handling of the bit width and the decimal point in the arithmetic circuit that implements the processing unit) is referred to as “accuracy” or “operation accuracy”.
 本例では、低精度演算処理部31と高精度演算処理部32は、通信路33およびデータ変換部34を介して接続される。なお、データ変換部34は、高精度演算処理部32と通信路33との間に設けられる。 In this example, the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 are connected via a communication path 33 and a data conversion unit 34. The data conversion unit 34 is provided between the high-precision arithmetic processing unit 32 and the communication path 33.
 通信路33は、例えば、バスによって実現されてもよい。なお、通信路33は、チップ内部に設けられる接続回路(Inter-connect)によって実現されていてもよい。また、通信路33には、バスや接続回路だけでなく、該バスや該接続回路に接続されるメモリ(外部メモリやバッファ等)を含んでいてもよい。 The communication path 33 may be realized by, for example, a bus. Note that the communication path 33 may be realized by a connection circuit (Inter-connect) provided inside the chip. Further, the communication path 33 may include not only a bus and a connection circuit but also a memory (such as an external memory and a buffer) connected to the bus and the connection circuit.
 データ変換部34は、低精度演算処理部31と高精度演算処理部32との間でやりとりされるデータを対象に、所定の変換処理を行う。このとき、データ変換部34によるデータ変換は、例えば、通信路33において行われる通信(データのやりとり)において、データ量(1データあたりの通信量)がより少なくなる演算精度のデータのデータ通信になるように行われる。 The data conversion unit 34 performs a predetermined conversion process on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32. At this time, the data conversion by the data conversion unit 34 is performed, for example, in data communication of data with a calculation accuracy in which the data amount (communication amount per data) becomes smaller in communication (data exchange) performed on the communication path 33. It is done to become.
 例えば、データ変換部34は、通信路33を通る各データが、低精度演算処理部31の演算精度と高精度演算処理部32の演算精度のうちのよりデータ量が少ない演算精度のデータになるように、送受信データの変換を行う。なお、データ量が同じであればより低い演算精度のデータになるように、送受信データの変換を行う。なお、図15の構成であれば、データ変換部34は、通信路33を通る各データが、低精度演算処理部31の演算精度のデータとなるようにデータの変換を行えばよい。低い方の演算精度に合わせることで、データ交換による演算精度の劣化を最小限にしつつ、低精度演算処理部31側でのデータ変換を不要にできる。 For example, the data conversion unit 34 converts each data passing through the communication path 33 into data having a smaller data amount among the calculation accuracy of the low-precision calculation processing unit 31 and the calculation accuracy of the high-precision calculation processing unit 32. Thus, the transmission and reception data are converted. Note that if the data amount is the same, the transmission / reception data is converted so that the data has lower calculation accuracy. In the configuration shown in FIG. 15, the data conversion unit 34 may perform data conversion such that each data passing through the communication path 33 becomes data of the calculation accuracy of the low-precision calculation processing unit 31. By adjusting to the lower calculation accuracy, the data conversion on the low-precision calculation processing unit 31 side can be eliminated while minimizing the deterioration of the calculation accuracy due to the data exchange.
 ここで、データの変換には、データ型を、通信端点とされる処理部のうちより低い演算精度のデータ型に合わせる型変換や、データの圧縮(特に、数値列圧縮や桁数の削減等の数値データ圧縮)や、2以上の変換後のデータの合成が含まれる。 Here, the data conversion includes a type conversion that matches a data type with a data type with lower operation accuracy of the processing unit that is a communication end point, and a data compression (particularly, a numerical sequence compression, a reduction in the number of digits, and the like). Numerical data compression) and the synthesis of two or more converted data.
 データ変換部34は、例えば、低精度演算処理部31から高精度演算処理部32に向けて送信されたデータを通信路33を介して受信し、該受信データ(低演算精度のデータ)を高精度演算処理部32が扱う演算精度(高演算精度)のデータに変換して、高精度演算処理部32に渡す。また、データ変換部34は、例えば、高精度演算処理部32から低精度演算処理部31に向けて送信されたデータを受信し、該受信データ(高演算精度のデータ)を低精度演算処理部31が扱う演算精度(低演算精度)のデータに変換して、通信路33に送出する。 The data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32 via the communication path 33, and converts the received data (low-operation-accuracy data) to high data. The data is converted into data of the calculation accuracy (high calculation accuracy) handled by the precision calculation processing unit 32 and passed to the high-precision calculation processing unit 32. The data conversion unit 34 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic processing data) into a low-precision arithmetic processing unit. The data is converted into data of a calculation accuracy (low calculation accuracy) handled by 31 and transmitted to the communication channel 33.
 例えば、低精度演算処理部31の演算精度(ここでは、演算に用いられる数値のデータ型)が整数16ビット(INT16)であり、高精度演算処理部32の演算精度が浮動小数点32ビット(FP32)である場合、データ変換部34は、通信路33を通るデータが整数16ビットのデータとなるようにデータ変換を行えばよい。また、例えば、低精度演算処理部31の演算精度が整数16ビット(INT16)であり、高精度演算処理部32の演算精度が浮動小数点16ビット(FP16)である場合、データ変換部34は、通信路33を通るデータが整数16ビットのデータとなるようにデータ変換を行えばよい。 For example, the operation precision (here, the data type of the numerical value used in the operation) of the low-precision operation processing unit 31 is an integer 16 bits (INT16), and the operation accuracy of the high-precision operation processing unit 32 is a floating-point 32-bit (FP32). ), The data conversion unit 34 may perform data conversion so that the data passing through the communication path 33 is 16-bit integer data. For example, when the operation precision of the low-precision operation processing unit 31 is 16-bit integer (INT16) and the operation accuracy of the high-precision operation processing unit 32 is 16-bit floating point (FP16), the data conversion unit 34 Data conversion may be performed so that data passing through the communication path 33 becomes integer 16-bit data.
 なお、通信路33を通るデータの通信が、片方向通信(例えば、低精度演算処理部31から高精度演算処理部32への送信のみ、高精度演算処理部32から低精度演算処理部31への送信のみ等)であってもよい。その場合、データ変換部34は、実際に行われる通信に対応したデータ変換のみを行えばよい。 The communication of the data passing through the communication path 33 is a one-way communication (for example, only the transmission from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32, and the transmission from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31). Transmission only). In this case, the data conversion unit 34 only needs to perform data conversion corresponding to communication actually performed.
 本実施形態において、データ変換部34は、データ処理装置300の構成および動作に併せて設計されたデータ変換を行う専用のデータ変換回路により実現されていてもよい。データ変換部34を専用の回路に実装することで、設定値や状態の読み出しやそれらに応じた分岐といった汎用化のための処理を省くことができ、さらなる効率化を図ることができる。また、データ変換を専用化することで、複数のデータをまとめて変換処理したり、複数のデータのデータ変換を並列で行ってその結果をとりまとめて一括送信する等の処理も容易に実装できるので、さらなる効率化を図ることができる。ここで、データ変換の並列処理およびその結果のとりまとめは、各データの変換と変換後のデータの合成とを組み合わせたデータ変換例の1つである。データ変換部34は、そのようなデータ変換を、例えば、SIMD(Single instruction multiple data)演算により実現してもよい。 In the present embodiment, the data conversion unit 34 may be realized by a dedicated data conversion circuit that performs data conversion designed according to the configuration and operation of the data processing device 300. By mounting the data conversion unit 34 in a dedicated circuit, it is possible to omit processing for generalization such as reading of set values and states and branching according to them, and further efficiency can be achieved. Also, by dedicating data conversion, it is possible to easily implement conversion processing of multiple data at once, or perform data conversion of multiple data in parallel, collect the results, and transmit them collectively. Further, efficiency can be further improved. Here, the parallel processing of the data conversion and the compilation of the results are one of the data conversion examples in which the conversion of each data and the synthesis of the converted data are combined. The data conversion unit 34 may realize such data conversion by, for example, a SIMD (Single instruction multiple data) operation.
 また、図16は、第2の実施形態のデータ処理装置の他の構成例を示すブロック図である。図15に示す例では、データ変換部34が、高精度演算処理部32側にのみ設けられていたが、データ変換部は低精度演算処理部31側にも設けることが可能である。図16に示すデータ処理装置300は、図15に示す構成と比べて、低精度演算処理部31と通信路33との間に、さらにデータ変換部35を備える点が異なる。すなわち、本例では、低精度演算処理部31と高精度演算処理部32が、データ変換部35、通信路33およびデータ変換部34を介して接続される。 FIG. 16 is a block diagram showing another configuration example of the data processing device of the second embodiment. In the example illustrated in FIG. 15, the data conversion unit 34 is provided only on the high-precision calculation processing unit 32 side, but the data conversion unit may be provided on the low-precision calculation processing unit 31 side. The data processing device 300 shown in FIG. 16 is different from the configuration shown in FIG. 15 in that a data conversion unit 35 is further provided between the low-precision arithmetic processing unit 31 and the communication path 33. That is, in this example, the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 are connected via the data conversion unit 35, the communication path 33, and the data conversion unit 34.
 データ変換部35は、低精度演算処理部31と高精度演算処理部32との間でやりとりされるデータを対象に、所定の変換処理を行う。 The data conversion unit 35 performs a predetermined conversion process on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32.
 本例において、データ変換部34およびデータ変換部35によるデータ変換は、通信路33において行われる通信(データのやりとり)において、データ量が、低精度演算処理部31で扱う演算精度で行うデータ通信のデータ量よりもさらに少なくなる演算精度のデータのデータ通信になるように行われる。 In the present example, the data conversion by the data conversion unit 34 and the data conversion unit 35 is performed in the communication (data exchange) performed on the communication path 33 in a data communication performed by the low-precision calculation processing unit 31 with the data amount at the calculation accuracy handled by the low-precision calculation processing unit 31. The data communication is performed so that the data amount is smaller than the data amount of the operation accuracy.
 例えば、データ変換部34およびデータ変換部35は、通信路33を通る各データが、低精度演算処理部31の演算精度で行うデータ通信量よりも少なくなる演算精度(以下、超低演算精度)のデータになるように、送受信データの変換を行う。 For example, the data conversion unit 34 and the data conversion unit 35 calculate the operation accuracy (hereinafter, ultra-low operation accuracy) in which each data passing through the communication path 33 is smaller than the data communication amount performed by the operation accuracy of the low-precision operation processing unit 31. The transmission / reception data is converted so that the data becomes
 本例では、データ変換部34は、例えば、低精度演算処理部31から高精度演算処理部32に向けて送信されたデータを、データ変換部35および通信路33を介して受信し、該受信データ(データ変換部35による変換後の超低演算精度のデータ)を高精度演算処理部32が扱う演算精度(高演算精度)のデータに変換して、高精度演算処理部32に渡す。また、データ変換部34は、例えば、高精度演算処理部32から低精度演算処理部31に向けて送信されたデータを受信し、該受信データ(高演算精度のデータ)を超低演算精度のデータに変換して、通信路33に送出する。 In the present example, the data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32 via the data conversion unit 35 and the communication path 33, and The data (the data of the ultra-low operation accuracy after conversion by the data conversion unit 35) is converted into data of the operation accuracy (high operation accuracy) handled by the high-accuracy operation processing unit 32, and is passed to the high-accuracy operation processing unit 32. The data conversion unit 34 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic data) to ultra-low-precision arithmetic. The data is converted into data and transmitted to the communication channel 33.
 また、データ変換部35は、例えば、高精度演算処理部32から低精度演算処理部31に向けて送信されたデータを、データ変換部34および通信路33を介して受信し、該受信データ(データ変換部34による変換後の超低演算精度のデータ)を低精度演算処理部31が扱う演算精度(低演算精度)のデータに変換して、低精度演算処理部31に渡す。また、データ変換部35は、例えば、高精度演算処理部32から低精度演算処理部31に向けて送信されたデータを受信し、該受信データ(高演算精度のデータ)を超低演算精度のデータに変換して、通信路33に送出する。 Further, the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 via the data conversion unit 34 and the communication path 33, and receives the received data ( The low-precision arithmetic processing unit 31 converts the ultra-low-operation-precision data (converted by the data conversion unit 34) into data of low-precision arithmetic processing (low-operation accuracy) handled by the low-precision arithmetic processing unit 31. Further, the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic data) to ultra-low-precision arithmetic. The data is converted into data and transmitted to the communication channel 33.
 例えば、低精度演算処理部31の演算精度(ここでは、演算に用いられる数値のデータ型)が整数16ビット(INT16)であり、高精度演算処理部32の演算精度が浮動小数点32ビット(FP32)である場合、データ変換部34およびデータ変換部35は、通信路33を通るデータが、INT16よりもデータ量が小さい整数12ビット(INT12)や整数8ビット(INT8)になるようにデータを圧縮してもよい。データ変換部34およびデータ変換部35は、データ圧縮をする際、データが数値データとしての意味を失わないように精度のみを低下させる数値データ圧縮(例えば、下位ビットの削減等)を行う。 For example, the operation precision (here, the data type of the numerical value used in the operation) of the low-precision operation processing unit 31 is an integer 16 bits (INT16), and the operation accuracy of the high-precision operation processing unit 32 is a floating-point 32-bit (FP32). ), The data conversion unit 34 and the data conversion unit 35 convert the data passing through the communication path 33 into an integer 12 bits (INT12) or an integer 8 bits (INT8) having a data amount smaller than INT16. It may be compressed. When performing data compression, the data conversion unit 34 and the data conversion unit 35 perform numerical data compression (for example, reduction of lower bits) that reduces only the accuracy so that the data does not lose its meaning as numerical data.
 なお、データ変換部34およびデータ変換部35は、深層学習における活性化関数の特徴を用いてデータ圧縮を行うことも可能である。例えば、活性化関数の1つであるステップ関数を利用すると、データを1ビットに圧縮できる。また、ReLU(ランプ関数)を利用すると、データの符号ビットを削減できる。 The data conversion unit 34 and the data conversion unit 35 can also perform data compression using the feature of the activation function in deep learning. For example, if a step function, which is one of the activation functions, is used, data can be compressed to 1 bit. If ReLU (ramp function) is used, the number of code bits of data can be reduced.
 また、データ変換部34およびデータ変換部35は、ビット数を削減するデータ変換を行う際に、半端なビット数の複数のデータを詰めて纏めたり、そのようにして纏められたデータを複数のデータに分解する処理(パック/アンパック処理)を行ってもよい。このようなパック/アンパック処理も、専用化することで効率化を図ることができる。 Further, when performing data conversion for reducing the number of bits, the data conversion unit 34 and the data conversion unit 35 pack together a plurality of pieces of data having an odd number of bits, or combine a plurality of pieces of data thus collected into a plurality of pieces of data. A process of decomposing the data (pack / unpack process) may be performed. The efficiency of the pack / unpack processing can be improved by specializing the pack / unpack processing.
 このようにして通信路33を通るデータ量を削減することで、演算精度が異なるコア(演算回路)間のデータ交換を高速化できる。さらに、コア間のデータ交換をメモリを介して行う場合には、データ交換のためのメモリの使用量を削減できるので、メモリ使用にかかる消費電力の削減も可能である。 (4) By thus reducing the amount of data passing through the communication path 33, data exchange between cores (arithmetic circuits) having different arithmetic precisions can be speeded up. Furthermore, when data exchange between cores is performed via a memory, the amount of memory used for data exchange can be reduced, and thus power consumption for memory use can be reduced.
 なお、データ交換を行う異なる演算精度のコアの組み合わせが2以上ある場合、各組み合わせ毎に、当該組み合わせにおけるコア間通信の通信路の一方または両方の端点に、上記のデータ変換部34やデータ変換部35を設ければよい。 When there are two or more combinations of cores having different computational accuracy for performing data exchange, the data conversion unit 34 or the data conversion unit is provided for each combination at one or both end points of the communication path of the inter-core communication in the combination. What is necessary is just to provide the part 35.
 次に、本発明の概要を説明する。図17は、本発明のデータ処理装置の概要を示すブロック図である。図17に示すデータ処理装置500は、低精度演算処理手段501と、高精度演算処理手段502と、第1のデータ変換手段504とを備える。 Next, the outline of the present invention will be described. FIG. 17 is a block diagram showing an outline of the data processing device of the present invention. The data processing device 500 shown in FIG. 17 includes a low-precision arithmetic processing unit 501, a high-precision arithmetic processing unit 502, and a first data conversion unit 504.
 低精度演算処理手段501(例えば、低精度演算処理部31)は、第1の精度で所定の演算を行う。 (4) The low-precision arithmetic processing unit 501 (for example, the low-precision arithmetic processing unit 31) performs a predetermined arithmetic operation with the first accuracy.
 高精度演算処理手段502(例えば、高精度演算処理部32)は、第1の精度よりも高い第2の精度で所定の演算を行う。 (4) The high-precision arithmetic processing unit 502 (for example, the high-precision arithmetic processing unit 32) performs a predetermined arithmetic operation at a second accuracy higher than the first accuracy.
 第1のデータ変換手段504(例えば、データ変換部34)は、高精度演算処理手段502と低精度演算処理手段501との間でデータの受け渡しを行うための通信路503の高精度演算処理手段502側の端点に設けられる。 The first data conversion unit 504 (for example, the data conversion unit 34) is a high-precision arithmetic processing unit of the communication path 503 for transferring data between the high-precision arithmetic processing unit 502 and the low-precision arithmetic processing unit 501. It is provided at the end point on the 502 side.
 第1のデータ変換手段504は、接続先の高精度演算処理手段502との間で受け渡されるデータが高精度演算処理手段502で扱えるデータであるとともに、通信路503を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下となり、かつ通信路503を通るデータの精度が第1の精度以下となるように、通信路503と高精度演算処理手段502との間を通るデータに対して所定の変換を行う。 The first data conversion unit 504 is configured so that the data transferred to and from the high-precision arithmetic processing unit 502 at the connection destination is data that can be handled by the high-precision arithmetic processing unit 502, and that the amount of data passing through the communication path 503 is The data passes between the communication path 503 and the high-precision arithmetic processing means 502 so that the data amount becomes equal to or less than the data amount when the data of the accuracy of 1 is used, and the accuracy of the data passing through the communication path 503 is equal to or less than the first accuracy. A predetermined conversion is performed on the data.
 このような構成により、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理であっても効率化できる。 (4) With such a configuration, efficiency can be improved even in a process in which an operation requiring high accuracy and an operation not requiring high accuracy are mixed.
 また、図18は、本発明のデータ処理回路の構成例を示す構成図である。図18に示すデータ処理回路600は、低精度演算回路601と、高精度演算回路602と、第1のデータ変換回路604とを備える。 FIG. 18 is a configuration diagram showing a configuration example of the data processing circuit of the present invention. The data processing circuit 600 illustrated in FIG. 18 includes a low-precision arithmetic circuit 601, a high-precision arithmetic circuit 602, and a first data conversion circuit 604.
 低精度演算回路601(例えば、低精度演算処理部31や低精度演算回路11)は、第1の精度で所定の演算を行う。 (4) The low-precision arithmetic circuit 601 (for example, the low-precision arithmetic processing unit 31 or the low-precision arithmetic circuit 11) performs a predetermined arithmetic operation with the first accuracy.
 高精度演算回路602(例えば、高精度演算処理部32や高精度演算回路12)は、第1の精度よりも高い第2の精度で所定の演算を行う。 (4) The high-precision operation circuit 602 (for example, the high-precision operation processing unit 32 or the high-precision operation circuit 12) performs a predetermined operation at a second accuracy higher than the first accuracy.
 第1のデータ変換回路604(例えば、データ変換部34)は、高精度演算回路602と低精度演算回路601との間でデータの受け渡しを行うための通信路603の高精度演算回路602側の端点に設けられ、通信路603と高精度演算回路602との間を通るデータに対して、予め定められた変換を行う。 The first data conversion circuit 604 (for example, the data conversion unit 34) is provided on the high-precision arithmetic circuit 602 side of the communication path 603 for transferring data between the high-precision arithmetic circuit 602 and the low-precision arithmetic circuit 601. A predetermined conversion is performed on data that is provided at an end point and passes between the communication path 603 and the high-precision arithmetic circuit 602.
 このようなデータ処理回路600において、第1のデータ変換回路604と接続先の高精度演算回路602との間で受け渡されるデータが高精度演算回路602が扱うデータであり、通信路603を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下であり、かつ当該通信路を通るデータの精度が第1の精度以下であるよう構成される。 In such a data processing circuit 600, data passed between the first data conversion circuit 604 and the high-precision arithmetic circuit 602 to be connected is data handled by the high-precision arithmetic circuit 602 and passes through the communication path 603. The data amount is equal to or less than the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.
 このような構成によっても、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理を効率化できる。 (4) Even with such a configuration, it is possible to efficiently perform processing in which arithmetic operations requiring high precision and arithmetic operations not requiring high accuracy are mixed.
 なお、データ処理回路600は、図19に示すように、さらに、通信路603の低精度演算回路601側の端点に設けられ、通信路603と低精度演算回路601との間を通るデータに対して、予め定められた変換を行う第2のデータ変換回路605を備えていてもよい。 As shown in FIG. 19, the data processing circuit 600 is further provided at an end point of the communication path 603 on the side of the low-precision arithmetic circuit 601, and is provided for data passing between the communication path 603 and the low-precision arithmetic circuit 601. And a second data conversion circuit 605 for performing predetermined conversion.
 このようなデータ処理回路600において、さらに、第2のデータ変換回路605と接続先の低精度演算回路601との間で受け渡されるデータが低精度演算回路601が扱うデータであるとともに、通信路603を通るデータ量が、第1の精度のデータを使用した場合のデータ量より少なく、かつ当該通信路603を通るデータの精度が第1の精度よりも低い構成であってもよい。 In such a data processing circuit 600, the data passed between the second data conversion circuit 605 and the low-precision arithmetic circuit 601 to be connected is the data handled by the low-precision arithmetic circuit 601 and the communication path. A configuration in which the amount of data passing through the communication path 603 is smaller than the amount of data when the data of the first accuracy is used, and the accuracy of data passing through the communication path 603 is lower than the first accuracy.
 このような構成によれば、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理をさらに効率化できる。 According to such a configuration, it is possible to further improve the efficiency of the processing in which the operation requiring high precision and the operation not requiring high precision are mixed.
 なお、上記の実施形態は以下の付記のようにも記載できる。 Note that the above embodiment can also be described as the following supplementary notes.
 (付記1)第1の精度で所定の演算を行う低精度演算処理手段と、第1の精度よりも高い第2の精度で所定の演算を行う高精度演算処理手段と、高精度演算処理手段と低精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第1のデータ変換手段とを備え、第1のデータ変換手段は、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第1の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とするデータ処理装置。 (Supplementary Note 1) Low-precision calculation processing means for performing a predetermined calculation with a first precision, high-precision calculation processing means for performing a predetermined calculation with a second precision higher than the first precision, and high-precision calculation processing means And a first data conversion means provided at an end of the communication path on the high-precision processing means side for transferring data between the low-precision processing means and the low-precision processing means. The data passed to and from the high-precision arithmetic processing means is data that can be handled by the high-precision arithmetic processing means, and the data amount passing through the communication path is equal to or less than the data amount when the first precision data is used. And performing a predetermined conversion on data passing between the communication path and the high-precision arithmetic processing means so that the accuracy of data passing through the communication path is equal to or less than the first accuracy. apparatus.
 (付記2)第1のデータ変換手段は、通信路から、低精度演算処理手段より高精度演算処理手段に渡されるデータを第1の精度のデータとして受け付け、受け付けた第1の精度のデータを高精度演算処理手段で扱える精度のデータに変換し、第1のデータ変換手段は、高精度演算処理手段より低精度演算処理手段に渡されるデータを高精度演算処理手段で扱えるデータのままで受け付け、受け付けたデータを低精度演算処理手段で扱える精度のデータに変換する付記1に記載のデータ処理装置。 (Supplementary Note 2) The first data conversion means receives data passed from the low-precision processing means to the high-precision processing means as first-precision data from the communication channel, and receives the received first-precision data. The first data conversion means converts the data into data having a precision that can be handled by the high-precision processing means, and accepts the data passed from the high-precision processing means to the low-precision processing means as the data that can be handled by the high-precision processing means 2. The data processing apparatus according to claim 1, wherein the received data is converted into data having an accuracy that can be handled by the low-precision arithmetic processing means.
 (付記3)通信路の低精度演算処理手段側の端点に設けられる第2のデータ変換手段をさらに備え、第1のデータ変換手段および第2のデータ変換手段は、接続先の演算処理手段との間で受け渡されるデータが接続先の演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量よりも少なく、かつ通信路を通るデータの精度が第1の精度よりも低くなるように、通信路と接続先の演算処理手段との間を通るデータに対して所定の変換を行う付記1または付記2に記載のデータ処理装置。 (Supplementary Note 3) The apparatus further includes a second data conversion means provided at an end point of the communication path on the low-precision processing means side, wherein the first data conversion means and the second data conversion means are connected to the processing processing means at the connection destination. The data passed between the communication paths is data that can be handled by the processing means of the connection destination, and the amount of data passing through the communication path is smaller than the amount of data when the first precision data is used. The data processing apparatus according to Supplementary note 1 or 2, wherein predetermined conversion is performed on data passing between the communication path and the processing means connected to the communication path so that the accuracy of the passing data is lower than the first accuracy. .
 (付記4)第1のデータ変換手段および第2のデータ変換手段は、通信路から、接続先の演算処理手段より相手側の演算処理手段に渡されるデータを第1の精度よりも低い所定の第3の精度のデータとして受け付け、受け付けた第3の精度のデータを接続先の演算処理手段で扱える精度のデータに変換し、第1のデータ変換手段および第2のデータ変換手段は、接続先の演算処理手段から相手先の演算処理手段に渡されるデータを接続先の演算処理手段で扱えるデータのままで受け付け、受け付けたデータを相手側の演算処理手段で扱える精度のデータに変換する付記3に記載のデータ処理装置。 (Supplementary Note 4) The first data conversion means and the second data conversion means transmit the data passed from the connection processing means to the other processing means from the communication path by a predetermined accuracy lower than the first accuracy. The first data conversion unit and the second data conversion unit receive the data of the third accuracy and convert the received data of the third accuracy into data of an accuracy that can be handled by the arithmetic processing unit of the connection destination. The data passed from the arithmetic processing means of the other party to the arithmetic processing means of the other party is received as it is as data that can be handled by the arithmetic processing means of the connection destination, and the received data is converted into data of accuracy that can be handled by the arithmetic processing means of the other party A data processing device according to claim 1.
 (付記5)第1のデータ変換手段および第2のデータ変換手段の少なくともいずれかは、通信路に変換後のデータを送出する際に、複数の変換後のデータを纏めて送出し、第1のデータ変換手段および第2のデータ変換手段の少なくともいずれかは、通信路から、纏められた複数の変換後のデータを受け付け、受け付けた複数の変換後のデータを分解した上で、分解後の各データに対して、接続先の演算処理手段で扱える精度のデータへの変換を行う付記3または付記4に記載のデータ処理装置。 (Supplementary Note 5) At least one of the first data conversion means and the second data conversion means, when transmitting the converted data to the communication path, collectively transmits the plurality of converted data, and At least one of the data conversion unit and the second data conversion unit receives the plurality of converted data collected from the communication channel, decomposes the received plurality of converted data, and The data processing apparatus according to Supplementary Note 3 or 4, wherein each data is converted into data having an accuracy that can be handled by the processing means connected to the connection destination.
 (付記6)第1のデータ変換手段および第2のデータ変換手段で行う変換が予め定められ、かつ固定されている付記3から付記5のうちのいずれかに記載のデータ処理装置。 (Supplementary note 6) The data processing device according to any one of Supplementary notes 3 to 5, wherein conversions performed by the first data conversion unit and the second data conversion unit are predetermined and fixed.
 (付記7)前記データ処理装置が、層状に結合された2以上のユニットで構成される所定の判別モデルを学習する学習装置であり、学習用データが入力されると、前記判別モデルの各ユニットの出力を所定の順番で計算する推論処理と、前記推論処理の結果に基づいて、前記各ユニットの出力の計算に用いられるパラメタの少なくとも一部を更新するパラメタ更新処理とを行う学習手段を備え、前記学習手段は、前記低精度演算処理手段として、前記推論処理において行われる演算のうちの指定された演算を、第1の演算精度で実施する高効率推論手段と、前記高精度演算処理手段として、前記パラメタ更新処理において行われる演算のうちの指定された演算を、前記第1の演算精度よりも高い第2の演算精度で実施する高精度パラメタ更新手段とを含む付記1から付記6のうちのいずれかに記載のデータ処理装置。 (Supplementary Note 7) The data processing device is a learning device that learns a predetermined discriminant model composed of two or more units connected in layers, and when learning data is input, each unit of the discriminant model is Learning means for performing inference processing for calculating the outputs of the units in a predetermined order, and parameter updating processing for updating at least a part of the parameters used for calculating the output of each unit based on the result of the inference processing. The learning means as the low-precision arithmetic processing means, a high-efficiency inference means for performing a specified operation of the operations performed in the inference processing with a first arithmetic accuracy, and the high-precision arithmetic processing means A high-precision parameter update in which a specified operation among the operations performed in the parameter update process is performed with a second operation accuracy higher than the first operation accuracy. The data processing apparatus according to any one of Appendices 6 Appendixes 1 and means.
 (付記8A)第1の精度で所定の演算を行う低精度演算回路と、第1の精度よりも高い第2の精度で所定の演算を行う高精度演算回路と、高精度演算回路と低精度演算回路との間でデータの受け渡しを行うための通信路の高精度演算回路側の端点に設けられ、通信路と高精度演算回路との間を通るデータに対して、予め定められた変換を行う第1のデータ変換回路とを備え、第1のデータ変換回路と接続先の高精度演算回路との間で受け渡されるデータが高精度演算回路が扱うデータであり、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下であり、かつ当該通信路を通るデータの精度が第1の精度以下であることを特徴とするデータ処理回路。 (Supplementary Note 8A) A low-precision calculation circuit that performs a predetermined calculation with a first precision, a high-precision calculation circuit that performs a predetermined calculation with a second precision higher than the first precision, a high-precision calculation circuit, and a low-precision calculation circuit A predetermined conversion is provided for data passing between the communication path and the high-precision arithmetic circuit, which is provided at an end point on the high-precision arithmetic circuit side of a communication path for transferring data to and from the arithmetic circuit. A first data conversion circuit for performing the processing, wherein data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit, and the amount of data passing through the communication path. Wherein the data amount is equal to or less than the data amount when using data of the first accuracy, and the accuracy of data passing through the communication path is equal to or less than the first accuracy.
 (付記8B)通信路の低精度演算回路側の端点に設けられ、通信路と低精度演算回路との間を通るデータに対して、予め定められた変換を行う第2のデータ変換回路をさらに備え、第2のデータ変換回路と接続先の低精度演算回路との間で受け渡されるデータが低精度演算回路が扱うデータであり、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量より少なく、かつ当該通信路を通るデータの精度が第1の精度よりも低い付記8Bに記載のデータ処理回路。 (Supplementary Note 8B) A second data conversion circuit, which is provided at an end point of the communication path on the low-precision calculation circuit side and performs predetermined conversion on data passing between the communication path and the low-precision calculation circuit, is further provided. The data passed between the second data conversion circuit and the low-precision arithmetic circuit to be connected is data handled by the low-precision arithmetic circuit, and the amount of data passing through the communication path is the first precision data. The data processing circuit according to attachment 8B, wherein the data amount is smaller than the data amount when used and the accuracy of data passing through the communication path is lower than the first accuracy.
 (付記9)第1の精度で所定の演算を行う低精度演算処理手段と、第1の精度よりも高い第2の精度で所定の演算を行う高精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第1のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第1の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とするデータ処理方法。 (Supplementary Note 9) Transfer of data between low-precision arithmetic processing means for performing a predetermined arithmetic operation at the first accuracy and high-precision arithmetic processing means for performing the predetermined arithmetic operation at a second accuracy higher than the first accuracy The first data conversion means provided at the end of the communication path for performing the high-precision processing on the side of the high-precision processing means can handle the data transferred to and from the high-precision processing means at the connection destination. Communication channel so that the amount of data passing through the communication channel is equal to or less than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication channel is equal to or less than the first accuracy. A data conversion method for performing predetermined conversion on data passing between the data processing means and the high-precision processing means.
 (付記10)第1のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第1の精度より低くなるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行い、通信路の低精度演算処理手段側の端点に設けられる第2のデータ変換手段が、接続先の低精度演算処理手段との間で受け渡されるデータが低精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第1の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第1の精度より低くなるように、通信路と低精度演算処理手段との間を通るデータに対して所定の変換を行う付記9に記載のデータ処理方法。 (Supplementary Note 10) The first data conversion means is configured such that the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is 1 for data passing between the communication path and the high-precision arithmetic processing means so that the data amount is smaller than the data amount when using data of 1 precision and the precision of data passing through the communication path is lower than the first precision. The second data converter provided at the end point of the communication path on the side of the low-precision arithmetic processing means performs low-precision arithmetic processing on the data transferred to and from the low-precision arithmetic processing means at the connection destination. Means that can be handled by the means, and the amount of data passing through the communication path is smaller than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication path is lower than the first accuracy. , Communication channel and The data processing method according to supplementary note 9 to perform a predetermined conversion on the data passing between the precision processing means.
 以上、実施形態および実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the exemplary embodiments and examples, the present invention is not limited to the exemplary embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 本発明は、深層学習に限らず、高い精度を必要とする演算と高い精度を必要としない演算が混在している処理を行う装置において、消費電力を抑えつつ該処理を行いたい場合に好適に適用可能である。 The present invention is not limited to deep learning, and is preferably used in a device that performs a process in which an operation that requires high precision and an operation that does not require high accuracy are mixed, while performing the process while suppressing power consumption. Applicable.
 10 演算回路
 11 低精度演算回路
 12 高精度演算回路
 13 メモリ
 14 制御装置
 15 バス
 51 ユニット
 52 ユニット間結合
 53 推論処理
 54 パラメタ更新処理
 100 学習装置
 101 学習前モデル記憶部
 102 学習用データ記憶部
 103a 高効率推論処理部
 103b 高精度推論処理部
 104a 高効率パラメタ更新処理部
 104b 高精度パラメタ更新処理部
 105 制御部
 106 学習処理部
 107 学習後モデル記憶部
 1000 コンピュータ
 1001 CPU
 1002 主記憶装置
 1003 補助記憶装置
 1004 インタフェース
 1005 ディスプレイ装置
 1006 入力デバイス
 1007 GPU
 1008 プロセッサ
 21 バス
 22a、22b、22c、22d 演算回路
 221 MAC
 222 メモリ層
 223 乗加算ツリー
 224 ALU
 300 データ処理装置
 31 低精度演算処理部
 32 高精度演算処理部
 33 通信路
 34、35 データ変換部
 500 データ処理装置
 501 低精度演算処理手段
 502 高精度演算処理手段
 503 通信路
 504 第1のデータ変換手段
 600 データ処理回路
 601 低精度演算回路
 602 高精度演算回路
 603 通信路
 604 第1のデータ変換回路
 605 第2のデータ変換回路
 90 大規模学習回路
REFERENCE SIGNS LIST 10 arithmetic circuit 11 low-precision arithmetic circuit 12 high-precision arithmetic circuit 13 memory 14 control device 15 bus 51 unit 52 unit connection 53 inference process 54 parameter update process 100 learning device 101 pre-learning model storage unit 102 learning data storage unit 103a high Efficiency inference processing unit 103b High-precision inference processing unit 104a High-efficiency parameter update processing unit 104b High-precision parameter update processing unit 105 Control unit 106 Learning processing unit 107 Model storage unit after learning 1000 Computer 1001 CPU
1002 main storage device 1003 auxiliary storage device 1004 interface 1005 display device 1006 input device 1007 GPU
1008 Processor 21 Bus 22a, 22b, 22c, 22d Arithmetic circuit 221 MAC
222 memory layer 223 squared addition tree 224 ALU
Reference Signs List 300 data processing device 31 low-precision arithmetic processing unit 32 high-precision arithmetic processing unit 33 communication channel 34, 35 data conversion unit 500 data processing device 501 low-precision arithmetic processing unit 502 high-precision arithmetic processing unit 503 communication channel 504 first data conversion Means 600 Data processing circuit 601 Low-precision arithmetic circuit 602 High-precision arithmetic circuit 603 Communication path 604 First data conversion circuit 605 Second data conversion circuit 90 Large-scale learning circuit

Claims (10)

  1.  第1の精度で所定の演算を行う低精度演算処理手段と、
     前記第1の精度よりも高い第2の精度で所定の演算を行う高精度演算処理手段と、
     前記高精度演算処理手段と前記低精度演算処理手段との間でデータの受け渡しを行うための通信路の前記高精度演算処理手段側の端点に設けられる第1のデータ変換手段とを備え、
     前記第1のデータ変換手段は、接続先の前記高精度演算処理手段との間で受け渡されるデータが前記高精度演算処理手段で扱えるデータであるとともに、前記通信路を通るデータ量が、前記第1の精度のデータを使用した場合のデータ量以下となり、かつ前記通信路を通るデータの精度が前記第1の精度以下となるように、前記通信路と前記高精度演算処理手段との間を通るデータに対して所定の変換を行う
     ことを特徴とするデータ処理装置。
    Low-precision arithmetic processing means for performing a predetermined arithmetic operation with a first accuracy;
    High-precision operation processing means for performing a predetermined operation at a second accuracy higher than the first accuracy;
    A first data conversion means provided at an end of the communication path for performing data transfer between the high-precision processing means and the low-precision processing means on the high-precision processing means side;
    The first data conversion means is configured such that data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is Between the communication path and the high-precision arithmetic processing means such that the data amount is equal to or less than the data amount when using the first accuracy data, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. A data processing device for performing a predetermined conversion on data passing through the data processing device.
  2.  前記第1のデータ変換手段は、前記通信路から、前記低精度演算処理手段より前記高精度演算処理手段に渡されるデータを前記第1の精度のデータとして受け付け、受け付けた前記第1の精度のデータを前記高精度演算処理手段で扱える精度のデータに変換し、
     前記第1のデータ変換手段は、前記高精度演算処理手段より前記低精度演算処理手段に渡されるデータを前記高精度演算処理手段で扱えるデータのままで受け付け、受け付けた前記データを前記低精度演算処理手段で扱える精度のデータに変換する
     請求項1に記載のデータ処理装置。
    The first data conversion means receives, as the first precision data, data passed from the communication path to the high precision calculation processing means from the low precision calculation processing means, and receives the received first precision data. Convert the data into data with accuracy that can be handled by the high-precision arithmetic processing means,
    The first data conversion means receives data passed from the high-precision arithmetic processing means to the low-precision arithmetic processing means as data which can be handled by the high-precision arithmetic processing means, and converts the received data to the low-precision arithmetic processing. The data processing device according to claim 1, wherein the data is converted into data having an accuracy that can be handled by a processing unit.
  3.  前記通信路の前記低精度演算処理手段側の端点に設けられる第2のデータ変換手段をさらに備え、
     前記第1のデータ変換手段および前記第2のデータ変換手段は、接続先の演算処理手段との間で受け渡されるデータが前記接続先の演算処理手段で扱えるデータであるとともに、前記通信路を通るデータ量が、前記第1の精度のデータを使用した場合のデータ量よりも少なく、かつ前記通信路を通るデータの精度が前記第1の精度よりも低くなるように、前記通信路と前記接続先の演算処理手段との間を通るデータに対して所定の変換を行う
     請求項1または請求項2に記載のデータ処理装置。
    A second data converter provided at an end point of the communication path on the low-precision processor side;
    The first data conversion unit and the second data conversion unit are configured so that the data passed to and from the connection processing unit is data that can be handled by the connection processing unit, and the communication path is The communication path and the communication path are set so that the amount of data passing therethrough is smaller than the amount of data when the data of the first accuracy is used, and the accuracy of data passing through the communication path is lower than the first accuracy. The data processing apparatus according to claim 1, wherein a predetermined conversion is performed on data passing between the connection destination arithmetic processing means.
  4.  前記第1のデータ変換手段および前記第2のデータ変換手段は、前記通信路から、前記接続先の演算処理手段より相手側の演算処理手段に渡されるデータを前記第1の精度よりも低い所定の第3の精度のデータとして受け付け、受け付けた前記第3の精度のデータを前記接続先の演算処理手段で扱える精度のデータに変換し、
     前記第1のデータ変換手段および前記第2のデータ変換手段は、前記接続先の演算処理手段から相手先の演算処理手段に渡されるデータを前記接続先の演算処理手段で扱えるデータのままで受け付け、受け付けた前記データを前記相手側の演算処理手段で扱える精度のデータに変換する
     請求項3に記載のデータ処理装置。
    The first data conversion unit and the second data conversion unit are configured to transmit, from the communication path, data passed from the connection processing unit to the partner processing unit, the predetermined data being lower than the first accuracy. And converting the received third accuracy data into accuracy data that can be handled by the connected processing means.
    The first data conversion unit and the second data conversion unit receive data passed from the connection processing unit to the partner processing unit as data that can be handled by the connection processing unit. 4. The data processing apparatus according to claim 3, wherein the received data is converted into data having an accuracy that can be handled by the processing means of the other party.
  5.  前記第1のデータ変換手段および前記第2のデータ変換手段の少なくともいずれかは、前記通信路に変換後のデータを送出する際に、複数の変換後のデータを纏めて送出し、
     前記第1のデータ変換手段および前記第2のデータ変換手段の少なくともいずれかは、前記通信路から、纏められた前記複数の変換後のデータを受け付け、受け付けた前記複数の変換後のデータを分解した上で、分解後の各データに対して、接続先の演算処理手段で扱える精度のデータへの変換を行う
     請求項3または請求項4に記載のデータ処理装置。
    At least one of the first data conversion unit and the second data conversion unit, when transmitting the converted data to the communication path, collectively transmits a plurality of converted data,
    At least one of the first data conversion unit and the second data conversion unit receives the plurality of converted data items collected from the communication path, and decomposes the received plurality of converted data items. The data processing apparatus according to claim 3, wherein the decomposed data is converted into data having an accuracy that can be handled by an arithmetic processing unit at a connection destination.
  6.  前記第1のデータ変換手段および前記第2のデータ変換手段で行う変換が予め定められ、かつ固定されている
     請求項3から請求項5のうちのいずれかに記載のデータ処理装置。
    The data processing device according to any one of claims 3 to 5, wherein the conversions performed by the first data conversion unit and the second data conversion unit are predetermined and fixed.
  7.  前記データ処理装置が、層状に結合された2以上のユニットで構成される所定の判別モデルを学習する学習装置であり、
     学習用データが入力されると、前記判別モデルの各ユニットの出力を所定の順番で計算する推論処理と、前記推論処理の結果に基づいて、前記各ユニットの出力の計算に用いられるパラメタの少なくとも一部を更新するパラメタ更新処理とを行う学習手段を備え、
     前記学習手段は、
     前記低精度演算処理手段として、前記推論処理において行われる演算のうちの指定された演算を、第1の演算精度で実施する高効率推論手段と、
     前記高精度演算処理手段として、前記パラメタ更新処理において行われる演算のうちの指定された演算を、前記第1の演算精度よりも高い第2の演算精度で実施する高精度パラメタ更新手段とを含む
     請求項1から請求項6のうちのいずれかに記載のデータ処理装置。
    The data processing device is a learning device that learns a predetermined discriminant model composed of two or more units combined in layers.
    When learning data is input, inference processing for calculating the output of each unit of the discriminant model in a predetermined order, and at least parameters used for calculating the output of each unit based on the result of the inference processing. A learning means for performing parameter update processing for partially updating the
    The learning means,
    High-efficiency inference means for performing, with the first operation accuracy, a specified operation among operations performed in the inference process, as the low-precision operation processing means;
    The high-precision arithmetic processing unit includes a high-precision parameter updating unit that performs a specified operation among operations performed in the parameter updating process with a second operation accuracy higher than the first operation accuracy. The data processing device according to claim 1.
  8.  第1の精度で所定の演算を行う低精度演算回路と、
     前記第1の精度よりも高い第2の精度で所定の演算を行う高精度演算回路と、
     前記高精度演算回路と前記低精度演算回路との間でデータの受け渡しを行うための通信路の前記高精度演算回路側の端点に設けられ、前記通信路と前記高精度演算回路との間を通るデータに対して、予め定められた変換を行う第1のデータ変換回路とを備え、
     前記第1のデータ変換回路と接続先の前記高精度演算回路との間で受け渡されるデータが前記高精度演算回路が扱うデータであり、
     前記通信路を通るデータ量が、前記第1の精度のデータを使用した場合のデータ量以下であり、かつ当該通信路を通るデータの精度が前記第1の精度以下である
     ことを特徴とするデータ処理回路。
    A low-precision arithmetic circuit that performs a predetermined operation with a first accuracy;
    A high-precision operation circuit that performs a predetermined operation at a second accuracy higher than the first accuracy;
    A communication path for transferring data between the high-precision arithmetic circuit and the low-precision arithmetic circuit is provided at an end point on the high-precision arithmetic circuit side, and a communication path between the communication path and the high-precision arithmetic circuit is provided. A first data conversion circuit that performs a predetermined conversion on the passing data;
    Data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit,
    The data amount passing through the communication path is equal to or less than the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. Data processing circuit.
  9.  第1の精度で所定の演算を行う低精度演算処理手段と、前記第1の精度よりも高い第2の精度で所定の演算を行う高精度演算処理手段との間でデータの受け渡しを行うための通信路の前記高精度演算処理手段側の端点に設けられる第1のデータ変換手段が、
     接続先の前記高精度演算処理手段との間で受け渡されるデータが前記高精度演算処理手段で扱えるデータであるとともに、前記通信路を通るデータ量が、前記第1の精度のデータを使用した場合のデータ量以下となり、かつ前記通信路を通るデータの精度が前記第1の精度以下となるように、前記通信路と前記高精度演算処理手段との間を通るデータに対して所定の変換を行う
     ことを特徴とするデータ処理方法。
    To transfer data between a low-precision operation processing unit that performs a predetermined operation at a first accuracy and a high-precision operation processing unit that performs a predetermined operation at a second accuracy higher than the first accuracy First data conversion means provided at an end point of the communication path on the high-precision arithmetic processing means side,
    The data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path uses the first-precision data. A predetermined conversion is performed on the data passing between the communication path and the high-precision arithmetic processing unit so that the data amount becomes equal to or less than the data amount in the case and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. A data processing method characterized by performing:
  10.  前記第1のデータ変換手段が、接続先の前記高精度演算処理手段との間で受け渡されるデータが前記高精度演算処理手段で扱えるデータであるとともに、前記通信路を通るデータ量が、前記第1の精度のデータを使用した場合のデータ量より少なく、かつ前記通信路を通るデータの精度が前記第1の精度より低くなるように、前記通信路と前記高精度演算処理手段との間を通るデータに対して所定の変換を行い、
     前記通信路の前記低精度演算処理手段側の端点に設けられる第2のデータ変換手段が、接続先の前記低精度演算処理手段との間で受け渡されるデータが前記低精度演算処理手段で扱えるデータであるとともに、前記通信路を通るデータ量が、前記第1の精度のデータを使用した場合のデータ量より少なく、かつ前記通信路を通るデータの精度が前記第1の精度より低くなるように、前記通信路と前記低精度演算処理手段との間を通るデータに対して所定の変換を行う
     請求項9に記載のデータ処理方法。
    The first data conversion means, while the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is Between the communication path and the high-precision arithmetic processing means such that the amount of data passing through the communication path is smaller than the data amount when the data of the first precision is used, and the accuracy of data passing through the communication path is lower than the first accuracy. Performs a predetermined transformation on the data passing through
    A second data conversion means provided at an end point of the communication path on the side of the low-precision processing means can handle data transferred to and from the low-precision processing means connected to the low-precision processing means. Data and the amount of data passing through the communication path is smaller than the amount of data when using the first precision data, and the precision of data passing through the communication path is lower than the first precision. The data processing method according to claim 9, wherein predetermined conversion is performed on data passing between the communication path and the low-precision arithmetic processing means.
PCT/JP2018/025773 2018-07-06 2018-07-06 Data processing device, data processing circuit, and data processing method WO2020008643A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2020528664A JP7120308B2 (en) 2018-07-06 2018-07-06 DATA PROCESSING DEVICE, DATA PROCESSING CIRCUIT AND DATA PROCESSING METHOD
PCT/JP2018/025773 WO2020008643A1 (en) 2018-07-06 2018-07-06 Data processing device, data processing circuit, and data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/025773 WO2020008643A1 (en) 2018-07-06 2018-07-06 Data processing device, data processing circuit, and data processing method

Publications (1)

Publication Number Publication Date
WO2020008643A1 true WO2020008643A1 (en) 2020-01-09

Family

ID=69060857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/025773 WO2020008643A1 (en) 2018-07-06 2018-07-06 Data processing device, data processing circuit, and data processing method

Country Status (2)

Country Link
JP (1) JP7120308B2 (en)
WO (1) WO2020008643A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210377122A1 (en) * 2020-05-26 2021-12-02 Synopsys, Inc. Mixed-precision neural networks

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002517038A (en) * 1998-05-27 2002-06-11 エイアールエム リミテッド Recirculation register file
JP2018010618A (en) * 2016-05-03 2018-01-18 イマジネイション テクノロジーズ リミテッド Convolutional neural network hardware configuration

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002008060A (en) 2000-06-23 2002-01-11 Hitachi Ltd Data processing method, recording medium and data processing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002517038A (en) * 1998-05-27 2002-06-11 エイアールエム リミテッド Recirculation register file
JP2018010618A (en) * 2016-05-03 2018-01-18 イマジネイション テクノロジーズ リミテッド Convolutional neural network hardware configuration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIN, DARRYL D. ET AL.: "Fixed Point Quantization of Deep Convolutional Networks", ARXIV, 2 June 2016 (2016-06-02), XP055284812, Retrieved from the Internet <URL:https://arxiv.org/abs/1511.06393v3> [retrieved on 20180831] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210377122A1 (en) * 2020-05-26 2021-12-02 Synopsys, Inc. Mixed-precision neural networks

Also Published As

Publication number Publication date
JP7120308B2 (en) 2022-08-17
JPWO2020008643A1 (en) 2021-03-11

Similar Documents

Publication Publication Date Title
CN107729989B (en) Device and method for executing artificial neural network forward operation
Hotkar et al. Implementation of Low Power and area efficient carry select Adder
CN116894145A (en) Block floating point for neural network implementation
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
WO2021044244A1 (en) Machine learning hardware having reduced precision parameter components for efficient parameter update
JP7292297B2 (en) probabilistic rounding logic
US11620105B2 (en) Hybrid floating point representation for deep learning acceleration
Geng et al. CQNN: a CGRA-based QNN framework
Zhang et al. Implementation and optimization of the accelerator based on FPGA hardware for LSTM network
Ma et al. FPGA-based AI smart NICs for scalable distributed AI training systems
Li et al. A precision-scalable energy-efficient bit-split-and-combination vector systolic accelerator for NAS-optimized DNNs on edge
WO2020008643A1 (en) Data processing device, data processing circuit, and data processing method
KR102635978B1 (en) Mixed-precision multiply-and-accumulation tree structure to maximize memory bandwidth usage for computational acceleration of generative large language model
US11551087B2 (en) Information processor, information processing method, and storage medium
CN111178492B (en) Computing device, related product and computing method for executing artificial neural network model
WO2020008642A1 (en) Learning device, learning circuit, learning method, and learning program
Lu et al. A reconfigurable DNN training accelerator on FPGA
Ghosh et al. FPGA based implementation of a double precision IEEE floating-point adder
Su et al. Processing element architecture design for deep reinforcement learning with flexible block floating point exploiting signal statistics
Mehta et al. High performance training of deep neural networks using pipelined hardware acceleration and distributed memory
US11537859B2 (en) Flexible precision neural inference processing unit
WO2021044227A1 (en) Neural network circuitry having floating point format with asymmetric range
Hojabr et al. TaxoNN: a light-weight accelerator for deep neural network training
CN110705196A (en) Error-free adder based on random calculation
US20220180177A1 (en) An efficient method for vlsi implementation of useful neural network activation functions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18925255

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020528664

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18925255

Country of ref document: EP

Kind code of ref document: A1