WO2020008643A1

WO2020008643A1 - Data processing device, data processing circuit, and data processing method

Info

Publication number: WO2020008643A1
Application number: PCT/JP2018/025773
Authority: WO
Inventors: 芙美代鷹野; 誠也柴田; 竹中　崇; 浩明井上
Original assignee: 日本電気株式会社
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2020-01-09
Also published as: JP7120308B2; JPWO2020008643A1

Abstract

A data processing device 500 comprises: a low-precision computation processing means 501 for performing prescribed computation with a first precision; a high-precision computation processing means 502 for performing prescribed computation with a second precision that is higher than the first precision; and a first data conversion means 504 provided at an end point, on the high-precision-computation-processing-means 502 side, of a communication path 503 for delivering data between the two computation processing means 501, 502. The first data conversion means 504 performs prescribed conversion on the data passing between the communication path 503 and the high-precision computation processing means 502 so that the data delivered to and from the high-precision computation processing means 502 can be handled by the computation processing means 502, the amount of data passing through the communication path 503 is smaller than or equal to a data amount in a case where data of the first precision is used, and the precision of data passing though the communication path 503 is less than or equal to the first precision.

Description

Data processing device, data processing circuit, and data processing method

The present invention relates to a data processing device, a data processing circuit, and a data processing method for performing data processing including calculations with two types of calculation accuracy.

With the spread of machine learning, further innovations are needed to cope with ever-changing situations.

To do so, it is necessary to incorporate various raw data acquired in the environment in which it is actually used into learning as learning data. In learning using learning data (machine learning), for example, parameters of arithmetic expressions and discriminants used in a predetermined learning device are adjusted based on the relationship between input and output indicated by the learning data. . The learning device is, for example, a discrimination model that performs discrimination on one or a plurality of labels when data is input.

Regarding the relationship between calculation resources and calculation accuracy in machine learning, for example, Non-Patent Document 1 discloses an example of a learning calculation circuit and a learning method for efficiently executing deep learning of a neural network, particularly with low power consumption. Has been described.

In Non-Patent Document 2, in deep learning in CNN (Convolutional Neural Network), a learning range is divided into a plurality of convolutional layers into a layer in which weights are fixed and a layer in which weights are updated (extended function layer). An example of a learning method for shortening the learning time by limiting is described.

Non-Patent Document 3 describes an optimization example of accelerator design based on FPGA (Field-Programmable Gate Array) as an example of a circuit configuration for learning operation in machine learning.

機械 Most of machine learning using learning data has been performed in a cloud environment where large-scale high-precision arithmetic circuits can be constructed to support general-purpose learning algorithms.

However, depending on the site, there are various restrictions on data movement, such as restrictions on network bandwidth and protection of privacy. Therefore, there is a mechanism that allows learning not in a cloud environment but in a device at the site (hereinafter referred to as an edge device layer). desired. For that purpose, a learning method that can obtain a sufficient recognition rate with less computer resources and thus lower power consumption is desired.

According to the learning method described in Non-Patent Document 1, a 16-bit fixed-point arithmetic circuit is used as compared with TK1 (Jetson @ Kit) of NVIDIA which performs learning using a 32-bit floating-point arithmetic circuit. It is said that learning can be realized with low power consumption. However, this method is intended to reduce the power consumption by reducing the bit width in the arithmetic circuit that performs all the learning operations (all the operations for adjusting the parameters) in exchange for a decrease in the operation accuracy. However, no consideration is given to the adverse effects caused by a reduction in the calculation accuracy of the calculation circuit itself. For example, no consideration is given to the possibility that sufficient calculation accuracy for performing the learning calculation is not ensured.

For example, in an arithmetic circuit that performs deep learning, a multi-layer operation using a configuration in which a plurality of units are connected in a layered manner is performed. In this case, the multi-layer operation is performed by calculating a unit output for each layer (so-called inference). Processing, for example, forward propagation processing) and a part for performing calculation for updating parameters (for example, weights) used in the calculation (so-called parameter updating processing, for example, back propagation processing). In particular, it can be said that the parameter update processing corresponds to an actual learning operation part in machine learning. Therefore, the calculation accuracy of the parameter update process is a calculation that greatly affects the recognition rate during operation, and the higher the accuracy, the better. On the other hand, the calculation accuracy of the inference processing does not need to be very high in many cases.

Therefore, among operations included in the learning process, for example, only operations requiring high accuracy are performed with high accuracy, and operations not requiring high accuracy are performed with low accuracy. Learning with accuracy becomes possible. Therefore, an apparatus that performs a process in which an operation that requires high precision and an operation that does not require high accuracy are mixed has an operation circuit of two types of operation accuracy, and the execution destination of each operation performed in the process is It is assumed that the circuit to be executed is executed while being switched according to the accuracy required for the operation. In this case, data exchange between cores (arithmetic circuits) having different arithmetic accuracies is an essential requirement in the device. In order to further improve the efficiency of the learning process including the data exchange between cores having different operation precisions, it is important to speed up the data exchange between the cores.

Note that the learning method described in Non-Patent Document 2 merely aims to reduce the learning time by limiting the learning range, and to improve the efficiency of learning processing including data exchange between cores having different calculation accuracy. In particular, no consideration is given to efficient data exchange between cores having different accuracy. Also, the method described in Non-Patent Document 3 is merely to reduce the circuit scale and the calculation time by optimizing the circuit configuration of the circuit that performs all the learning operations, and also between the cores having different calculation accuracy. No consideration is given to improving the efficiency of learning processing including data exchange, particularly the efficiency of data exchange between cores having different accuracies.

The present invention has been made in view of the above-described problems, and provides a data processing device, a data processing circuit, and a data processing method that can further increase the efficiency of a process in which an operation that requires high accuracy and an operation that does not require high accuracy are mixed. The purpose is to do.

A data processing device according to the present invention includes a low-precision arithmetic processing unit that performs a predetermined operation with a first accuracy, a high-precision arithmetic processing unit that performs a predetermined operation with a second accuracy higher than the first accuracy, A first data converter provided at an end of the communication path on the side of the high-precision arithmetic processing means for transferring data between the precision arithmetic processing means and the low-precision arithmetic processing means; The means is provided when the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path uses first-precision data. A predetermined conversion is performed on the data passing between the communication path and the high-precision arithmetic processing means so that the data amount of the communication path becomes equal to or less than the data amount and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. And

A data processing circuit according to the present invention includes a low-precision arithmetic circuit that performs a predetermined operation at a first accuracy, a high-precision arithmetic circuit that performs a predetermined operation at a second accuracy higher than the first accuracy, and a high-precision operation. Provided at the end of the communication path for transmitting data between the circuit and the low-precision arithmetic circuit on the high-precision arithmetic circuit side, and predetermined for data passing between the communication path and the high-precision arithmetic circuit. And a first data conversion circuit for performing the converted data, wherein data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit. Is less than or equal to the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.

The data processing method according to the present invention is characterized in that a low-precision calculation processing means for performing a predetermined calculation with a first precision and a high-precision calculation processing means for performing a predetermined calculation with a second precision higher than the first precision. The first data conversion means provided at the end of the communication path on the side of the high-precision arithmetic processing means for transferring data with the high-precision arithmetic processing means is capable of performing high-precision arithmetic In addition to the data that can be handled by the processing means, the amount of data passing through the communication path is equal to or less than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication path is equal to or less than the first accuracy. In addition, a predetermined conversion is performed on data passing between the communication path and the high-precision arithmetic processing means.

In the data processing method according to the present invention, the first data conversion means may be configured so that the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the data passing through the communication path. The distance between the communication path and the high-precision arithmetic processing means is set so that the amount is smaller than the data amount when the first precision data is used, and the precision of the data passing through the communication path is lower than the first precision. The second data converter provided at the end of the communication path on the side of the low-precision processing means performs predetermined conversion on the data passing therethrough. The data that can be handled by the low-precision arithmetic processing means, the amount of data passing through the communication path is smaller than the amount of data when the data of the first precision is used, and the precision of data passing through the communication path is lower than the first precision. Lower As described above, it may be configured to perform a predetermined conversion on the data passing between the channel and the low-precision arithmetic processing means.

According to the present invention, it is possible to further improve the efficiency of processing in which arithmetic operations requiring high precision and arithmetic operations not requiring high accuracy are mixed.

FIG. 3 is an explanatory diagram illustrating an outline of a learning method as an example of the data processing method of the present invention. It is an explanatory view showing an example of input and output of a certain unit, and combination with another unit. It is a block diagram showing an example of composition of a learning device of a 1st embodiment. FIG. 2 is a configuration diagram illustrating an example of a hardware configuration of a learning processing unit 106. FIG. 4 is an explanatory diagram showing an example of a combination of the calculation accuracy in the low precision calculation circuit 11 and the calculation precision in the high precision calculation circuit 12. FIG. 2 is a schematic block diagram illustrating a configuration example of a computer according to the learning device 100. FIG. 3 is a schematic configuration diagram illustrating an example of an arithmetic circuit. FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit. FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit. FIG. 9 is a schematic configuration diagram illustrating another example of the arithmetic circuit. 4 is a flowchart illustrating an example of an operation of the learning device 100 according to the first embodiment. 6 is a flowchart illustrating a more specific operation example of the learning device 100. 9 is a flowchart illustrating another example of a more specific operation of the learning device 100. 9 is a flowchart illustrating another example of a more specific operation of the learning device 100. FIG. 9 is an explanatory diagram illustrating a configuration example of a data processing device according to a second embodiment. FIG. 9 is an explanatory diagram illustrating a configuration example of a data processing device according to a second embodiment. It is a block diagram showing the outline of the data processor of the present invention. FIG. 2 is a configuration diagram illustrating a configuration of a data processing circuit of the present invention. FIG. 9 is a configuration diagram illustrating another configuration of the data processing circuit of the present invention.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following, the present invention will be described using a learning process in deep learning as an example of a process in which an operation requiring high accuracy and an operation not requiring high accuracy are mixed. The data processing method is not limited to the learning process, the learning device, and the learning method.

First, an outline of a learning process as an example of data processing of the present invention will be described. FIG. 1A is an explanatory diagram showing an example of a general learning method in a neural network including one or more intermediate layers between an input layer and an output layer, and a circuit configuration therefor. 3) is an explanatory diagram showing an example of a learning method as an example of the data processing method of the present invention and an example of a circuit configuration therefor.

In the example shown in FIG. 1A, a large-scale learning circuit 90 is used to learn the entire neural network, which is a predetermined discriminant model, in order to support a learning algorithm for general use.

In FIG. 1, balloons attached to the circuits schematically show directions and ranges of processing in the learning process of the neural network. In the balloon, reference numeral 51 (circle in the figure) represents a unit corresponding to a neuron in the neural network. Reference numeral 52 (a line connecting the units in the drawing) represents an inter-unit connection. Reference numeral 53 (the right-handed bold arrow in the figure) indicates the inference processing and its range. Reference numeral 54 (a thick arrow pointing left in the figure) indicates a parameter update process and its range. Although FIG. 1 shows an example of a feedforward type neural network in which an input to each unit is an output of a unit in a preceding layer, an input to each unit is not limited to this. For example, when time series information is held, the input to each unit can include the output of the unit of the preceding layer at the previous time, as in a recurrent neural network. In such a case as well, the direction of the inference processing is considered to be the direction (forward direction) from the input layer to the output layer. Such inference processing performed in a predetermined order from the input layer is also called “forward propagation”. On the other hand, the direction of the parameter update processing is not particularly limited. The direction may be a direction from the output layer to the input layer (reverse direction) as in the parameter update processing in the figure. Although the direction of the parameter update processing in the figure is an example of the error back propagation method, the parameter update processing is not limited to the error back propagation method. For example, the parameter update processing may be STDP (Spike \ Timing \ Dependent \ Plasticity).

限ら Not limited to neural networks, examples of the method of learning a model in deep learning include the following learning methods. First, after inputting learning data to the input layer, an inference process of calculating the output of each unit in the forward direction in each layer up to the output layer is performed (forward propagation: see arrow 53 in the figure). Next, based on an error calculated from the output from the output layer (final output) and the relationship between the input and the output indicated by the learning data, the output layer to the first layer are designed to minimize the error. Tracing each layer in the reverse direction to perform a parameter update process of updating a parameter for calculating an output of each unit in the layer (back propagation: see arrow 54 in the figure).

As shown in FIG. 1A, when the entire model is set as a learning target, the output of each unit in each layer in all layers (first to n-th layers) subsequent to the input layer is subjected to parameter update processing. The parameters for calculation (for example, the weight of a unit connection that connects each unit in a layer to a unit in another layer) are updated. By repeating such a parameter updating process a plurality of times while changing the learning data, for example, a learned model having a high recognition rate can be generated. FIG. 1A shows a large-scale learning circuit 90 that performs the above-described inference processing and parameter updating processing with high calculation accuracy as an example of realizing an arithmetic circuit that performs such learning. However, the higher the calculation accuracy of the inference process and the parameter update process, and the wider the calculation range of the process, the larger the number of expansion terms of the error function and the size of the circuit, resulting in a large increase in power consumption.

On the other hand, in the present invention, as shown in FIG. 1B, only a part of the model is set as a learning target. Note that the learning here refers to a parameter updating process, which is a more actual learning process, as described above. When only a part of the model is to be learned, the process is performed in the same manner as described above up to forward propagation. Then, based on an error calculated from the output from the output layer (final output) and the relationship between the input and output indicated by the learning data, a designated unit (for example, the nth layer which is the output layer) For only the units in each layer from the first layer to the k-th layer), a parameter updating process for updating the parameter for calculating the output of the unit (for example, the weight for coupling with another unit) is performed.

In FIG. 1B, as an implementation example of the arithmetic circuit 10 that performs such learning, a high-precision arithmetic circuit 12 that performs parameter update processing of some units specified with high arithmetic accuracy, and a high-precision arithmetic circuit 12 An example is shown in which a low-precision operation circuit 11 that performs inference processing of at least a specified unit with lower operation accuracy is combined. In addition to the provision of such two operation circuits having different operation precisions, the high-precision operation circuit 12 is caused to perform parameter update processing for some units requiring high-precision operation, The arithmetic circuit 11 performs other processing that does not require high-precision arithmetic. As described above, in the learning operation on one learning data, at least a part of the inference processing is performed with a low calculation accuracy, and at least a part of the parameter update processing is performed with a high calculation accuracy. By optimizing the range of the parameter update processing performed in step (1), computer resources are made more efficient (low power consumption, etc.) and sufficient calculation accuracy is secured.

Although FIG. 1B shows an example in which some layers on the output side are set as a range for updating parameters (actual learning range), the range for updating parameters is not limited to the layers on the output side. It is also possible to individually specify an odd layer, an even layer, or the like among the first to n-th layers. FIG. 1B shows an example in which the range of the parameter update process itself is limited. However, the range of the parameter update process itself is not limited, and the range of the parameter update process performed with high calculation accuracy is limited. Is also good. That is, it is possible to perform the parameter update processing with high calculation accuracy only for some of the units, and perform the parameter update processing with low calculation accuracy for the other units. It should be noted that the parameter update processing can be divided into three types: a unit performed by a high-precision calculation, a unit performed by a low-precision calculation, and a unit not performed (the parameters are fixed at that time). It is.

Further, as another example of the method of dividing the processing to be subjected to the high-precision operation and the low-precision operation, the inference processing of all the units is performed by the low-precision operation, and the parameter update processing of all the units is performed by the high-precision operation. It is also possible to do. Further, for example, it is also possible to perform the inference processing of all the units by low-precision calculation, and to perform the parameter update processing of some units by high-precision calculation. In that case, the parameter update processing may be performed by low-precision calculation or may be excluded from the parameter update processing for some of the remaining units excluded from the high-precision calculation. Further, for example, it is also possible to perform inference processing and parameter update processing by low-precision calculation for some units, and to perform inference processing and parameter update processing by high-precision calculation for the remaining units.

In other words, in the learning method as an example of the data processing method of the present invention, the learning device includes a low-precision operation circuit having a relatively low operation accuracy and a high-precision operation circuit having a relatively high operation accuracy. Any configuration may be used as long as it causes the low-precision arithmetic circuit to perform inference processing for at least some of the units and the high-precision arithmetic circuit to perform parameter update processing for at least some of the units. In addition, the inference processing of some of the remaining units may be performed by a low-precision arithmetic circuit or a high-precision arithmetic circuit. Further, the parameter update processing of the remaining part of the units may be performed by a low-precision arithmetic circuit, or the processing itself may be omitted. Which units are subject to high-precision inference processing or low-precision inference processing, and which units are subject to high-precision parameter update processing or low-precision parameter update processing Alternatively, there is no particular limitation on whether or not the processing is to be performed.

The above is an example in which two operation circuits having different operation precisions are used. However, for example, the case where two or more operation circuits having different operation precisions are used is basically the same. In other words, if the configuration is such that the parameter update processing of a certain unit is performed by an arithmetic circuit having a higher calculation accuracy, the calculation accuracy of the arithmetic circuit that performs the inference processing of a certain unit is different from that of the other calculation unit. It is not particularly limited in which arithmetic circuit the inference processing and the parameter update processing of some units are performed or the processing itself is not performed.

FIG. 2 is an explanatory diagram showing an example of input / output of the unit and connection with another unit when focusing on one unit. FIG. 2A shows an example of input and output of one unit, and FIG. 2B shows an example of coupling between units arranged in two layers. As shown in FIG. 2A, when there are four inputs (x ₁ to x ₄ ) and one output (z) for one unit, the operation of the unit is, for example, the equation (1A) Is represented as Here, f () represents an activation function.

z = f (u) (1A)
_{_{_{_{However, u = a + w 1 x}}}} 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 ··· (1B)

In the equation (1B), a represents an intercept, and w ₁ to w ₄ represent parameters such as weights corresponding to each input (x ₁ to x ₄ ).

On the other hand, as shown in FIG. 2B, when the units are connected between the layers arranged in two layers, the input to each unit in the layer (x ₁ to x ₁₎ is focused on the subsequent layer. The output (z ₁ to z ₄ ) of each unit with respect to x ₄ ) is expressed, for example, as follows. Note that i is an identifier of a unit in the same layer (i = 1 to 3 in this example).

z _i = f (u _i ) (2A)
_{_{_{_{However, u i = a + w i}}}} , 1 x 1 + w i, 2 x 2 + w i, 3 x 3 + w i, 4 x 4 ··· (2B)

In the following, Expression (2B) may be simplified and written as z _i = Σwi _{, k} * x _k . The section a is omitted. In addition, the intercept a can be regarded as a coefficient (one of parameters) of a constant term having a value of 1. Here, k represents an input to each unit in the layer, more specifically, an identifier of another unit performing the input. At this time, if the input to each unit in the layer is only the output of each unit of the preceding layer, the simplified equation _{^{above, u i (L) = Σw}} i, k (L) * z k ( It is also possible to write ^L-1) . Note that L represents a layer identifier. In these equations, w _{i, k} is a parameter of each unit i in the layer (the L-th layer), more specifically, a weight of a bond (inter-unit bond) between each unit i and another unit k. Equivalent to. In the following, there is a case where a function (activation function) for determining an output value of a unit is simplified and z = Σw * x without distinguishing the unit.

In the above example, the calculation for obtaining the output z from the input x for a certain unit corresponds to the inference processing in the unit. At this time, the parameter w is fixed. On the other hand, the calculation for obtaining the parameter w for a certain unit corresponds to a parameter updating process in the unit.

Embodiment 1 FIG.
FIG. 3 is a block diagram illustrating a configuration example of the learning device according to the first embodiment. The learning device 100 illustrated in FIG. 3 includes a pre-learning model storage unit 101, a learning data storage unit 102, a learning processing unit 106, and a post-learning model storage unit 107.

前 The pre-learning model storage unit 101 stores information on the model before learning. The information of the model before learning may include an initial value of the parameter.

The learning data storage unit 102 stores learning data that is data used for learning a model. The format of the learning data is not particularly limited.

The learning processing unit 106 performs learning of the model stored in the pre-learning model storage unit 101 using the learning data stored in the learning data storage unit 102.

The learning processing unit 106 of the present embodiment includes at least the high-efficiency inference processing unit 103a, the high-precision parameter update processing unit 104b, and the control unit 105. The learning processing unit 106 may further include a high-precision inference processing unit 103b and a high-efficiency parameter update processing unit 104a, as shown in FIG.

(4) The high-efficiency inference processing unit 103a performs inference processing for a specified layer or unit with a first calculation accuracy.

The high-precision parameter update processing unit 104b performs a parameter update process for a specified layer, unit, or parameter with a second operation accuracy higher than the first operation accuracy.

The control unit 105 controls each processing unit (in this example, the high-efficiency inference processing unit 103a, the high-accuracy inference processing unit 103b, the high-efficiency parameter update processing unit 104a, and the high-precision parameter update processing unit 104b) that performs the learning process. Then, necessary learning processing is performed. More specifically, the control unit 105 reads the model and the learning data before learning, and controls the switching of the calculation accuracy for the learning process by giving a calculation instruction to each processing unit that performs the learning process. The calculation instruction includes designation of a unit to be calculated and input of parameters necessary for the calculation.

後 The post-learning model storage unit 107 stores information on the model after learning. The information on the model after learning may include the updated parameter values of each unit.

FIG. 4 is a configuration diagram showing an example of a hardware configuration of the learning processing unit 106. As shown in FIG. 4, the learning processing unit 106 is configured by an arithmetic processing device or the like in which the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12, the memory 13, and the control device 14 are connected via the bus 15. It may be realized. Note that the high-precision operation circuit 12 may be any circuit that can perform an operation with higher operation accuracy than the low-accuracy operation circuit 11.

In that case, the high-efficiency inference processing unit 103a and the high-efficiency parameter update processing unit 104a may be realized by, for example, the low-precision arithmetic circuit 11. The high-precision inference processing unit 103b and the high-precision parameter update processing unit 104b may be realized by, for example, the high-precision arithmetic circuit 12. Further, the control unit 105 may be realized by, for example, the control device 14.

In this example, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 are connected via a bus 15, respectively, and can exchange data such as notifying each other of the arithmetic results via the bus 15. Note that a memory 13 may be further connected to the bus 15. In this case, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 can also exchange data via the memory 13. In that case, the memory 13 is treated as a part of the communication path. The memory 13 may be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as On-chip @ memory. That is, the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12, and the memory 13 may be internally connected in the chip. Also, the memory 13 does not have to be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as Off-chip @ memory. That is, it may be externally connected via an external memory interface.

In the present embodiment, the processing unit that performs the learning process (particularly, the inference process and the parameter update process) measures the width and fineness of the range of the numerical data actually used for the calculation (more specifically, the processing unit). (A measure of the breadth and fineness of the range of numeric data determined by the handling of the bit width and the decimal point, etc.) in the arithmetic circuit that implements the above is referred to as “precision” or “operation accuracy”. As an example of a combination of the low calculation accuracy, which is the calculation accuracy in the low-precision calculation circuit 11, and the high calculation accuracy, which is the calculation accuracy in the high-precision calculation circuit 12, there is, for example, a combination as shown in FIG. FIG. 5 is an explanatory diagram showing an example of a combination of low operation accuracy, which is the operation accuracy of the low accuracy operation circuit 11, and high operation accuracy, which is the operation accuracy of the high accuracy operation circuit 12.

The combination of the calculation accuracy in the low-precision calculation circuit 11 and the calculation accuracy in the high-precision calculation circuit 12 is not limited to that shown in FIG. For example, the calculation accuracy (low calculation accuracy) in the low-precision calculation circuit 11 is defined as any one of {1, 2, 8, 16} bits of a fixed decimal point or any of {1, 2, 8, 16} bits of an integer. The calculation accuracy (high calculation accuracy) in the high-precision calculation circuit 12 is either fixed-point {2,8,16,32} bits, floating-point {9,16,32} bits or power {of} 2. Any of {8, 16, 24, 32} bits of floating point may be used. However, the high calculation accuracy is higher than the low calculation accuracy (for example, the range of numerical data is wider, the range of numerical data is finer, and the number of significant digits that can be expressed is larger). I do.

FIG. 6 is a schematic block diagram illustrating a configuration example of a computer according to the learning device 100. The computer 1000 includes a processor 1008, a main storage device 1002, an auxiliary storage device 1003, an interface 1004, a display device 1005, and an input device 1006. Further, the processor 1008 may include various arithmetic and processing devices such as the CPU 1001 and the GPU 1007.

The learning device 100 may be implemented in, for example, a computer 1000 as shown in FIG. In this case, the operation of the learning device 100 (in particular, the control unit 105) may be stored in the auxiliary storage device 1003 in the form of a program. The CPU 1001 reads out a program from the auxiliary storage device 1003, expands the program in the main storage device 1002, and performs a predetermined process in the learning device 100 according to the program. Note that the CPU 1001 is an example of an information processing device that operates according to a program, and the computer 1000 is not limited to a CPU (Central Processing Unit), but may be an MPU (Micro Processing Unit), an MCU (Memory Control Unit), or a GPU (Graphics). Processing Unit).

FIG. 6 shows an example in which the computer 1000 further includes a GPU 1007 in which the above low-precision arithmetic circuit 11 and high-precision arithmetic circuit 12 are mounted in addition to the CPU 1001. The case where the circuit 12 is implemented by another processor or an arithmetic unit (such as a MAC (multiplier-accumulator), a multiplier tree, or an ALU (Arthmetic Logic Unit) array, which will be described later) is not limited to this example. What is necessary is just to have an arithmetic unit. Further, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 may be mounted on different chips, and a specific chip configuration is not particularly limited.

The auxiliary storage device 1003 is an example of a non-transitory tangible medium. Other examples of non-transitory tangible media include a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, and a semiconductor memory connected via the interface 1004. When the program is distributed to the computer 1000 via a communication line, the computer that has received the distribution may load the program into the main storage device 1002 and execute a predetermined process in the learning device 100.

The program may be for realizing a part of a predetermined process in the learning device 100. Further, the program may be a difference program that realizes a predetermined process in the learning device 100 in combination with another program already stored in the auxiliary storage device 1003.

The interface 1004 transmits and receives information to and from another device. The display device 1005 presents information to the user. Further, the input device 1006 receives input of information from a user.

Also, some elements of the computer 1000 can be omitted depending on the processing content of the learning device 100. For example, if the computer 1000 does not present information to the user, the display device 1005 can be omitted. For example, if the computer 1000 does not accept information input from a user, the input device 1006 can be omitted.

Part or all of the above components are implemented by a general-purpose or dedicated circuit (Circuitry), a processor, or a combination thereof. These may be constituted by a single chip, or may be constituted by a plurality of chips connected via a bus. In addition, some or all of the above-described components may be realized by a combination of the above-described circuit and the like and a program.

When some or all of the above-described components are realized by a plurality of information processing devices or circuits, the plurality of information processing devices or circuits may be centrally arranged or distributed. Good. For example, the information processing device, the circuit, and the like may be implemented as a form in which each is connected via a communication network, such as a client and server system or a cloud computing system.

[Circuit configuration]
Next, some examples of the configuration of an inference circuit that is an implementation example of at least the high-efficiency inference processing unit 103a will be described. The high-efficiency inference processing unit 103a performs, for example, for each unit in the specified layer or the specified unit, when receiving an input to the unit, performs an inference process of calculating the output of the unit with a predetermined low calculation accuracy. , A calculation result may be output. At this time, the high-efficiency inference processing unit 103a may receive the values of the inputs and the values of other variables (parameters such as weights and intercepts) used for calculating the output of the unit as inputs and perform the above processing. Good. Hereinafter, the operation performed in the inference processing may be referred to as an inference operation.

Hereinafter, a circuit for performing an inference operation is referred to as an “inference circuit”, and in particular, a circuit for performing an inference operation with lower operation accuracy than the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b is referred to as “inference circuit”. High efficiency inference circuit. " In this manner, the operation accuracy of the inference circuit is made as low as possible, and at least lower than the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b (for example, the bit width is changed from 32 bits to 16 bits, floating point The operation is a fixed-point operation, for example) to reduce power consumption. In order to distinguish the circuit from the high-efficiency inference circuit, a circuit for performing an inference operation with the same operation accuracy as the operation accuracy of the parameter update operation performed by the high-precision parameter update processing unit 104b may be referred to as a “high-accuracy inference circuit”. is there. The above-described high-precision inference processing unit (not shown) may be realized by such a high-precision inference circuit.

The configuration of the inference circuit described below can be realized regardless of whether the inference operation is performed with high accuracy or with low accuracy. That is, the difference between the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b may be only the accuracy of each variable, an adder, and a multiplier used for the operation in the arithmetic circuit in which the operation of the processing unit is implemented.

The simplest example of the inference circuit has a configuration in which one multiplier-adder (MAC) 221 in which a multiplier and an adder are combined is provided (see the arithmetic circuit 22a in FIG. 7A). Reference numeral 21 represents a bus.

The MAC 221 may include a multiplier, an adder, a storage element holding three inputs, and a storage element holding one output (see FIG. 7B). The MAC 221 illustrated in FIG. 7B is an example of an arithmetic circuit that calculates one output variable z = a + w * x when receiving three variables a, w, and x. In this example, z corresponds to the output of the unit, a and w correspond to parameters (fixed in the inference processing), and x corresponds to the input of the unit. In such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.). For example, when the high-efficiency inference processing unit 103a is implemented by the arithmetic circuit 22a, the arithmetic by the variables (a, w, x, z), the adder, and the multiplier in the MAC 221 included in the arithmetic circuit 22a has low arithmetic accuracy (first arithmetic operation). ). At this time, it is not necessary that all of the variables, addition, and multiplication in the circuit have the same precision (the same applies hereinafter). For example, it is only necessary that the precision used in each of the variables, addition and multiplication be lower than the precision used in each of the variables, addition and multiplication of the arithmetic circuit that implements the high-precision parameter update processing unit 104b.

FIGS. 8 to 10 are schematic configuration diagrams showing another example of an operation circuit (inference circuit) for inference operation. The inference circuit may have a configuration in which a plurality of MACs 221 are connected in parallel (a configuration of a GPU), for example, as in an arithmetic circuit 22b illustrated in FIG. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).

The inference circuit may have a configuration in which a plurality of multiply-addition trees 223 are connected in parallel via a memory layer 222, for example, as in an arithmetic circuit 22c shown in FIG. The multiply-add tree 223 shown in FIG. 9 is a circuit having a configuration in which four multipliers, two adders, and one adder are connected in a tree shape. Note that an example of the arithmetic circuit 22c shown in FIG. 9 is also disclosed in Non-Patent Document 3. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).

The inference circuit may have a configuration in which a plurality of ALUs 224 are connected in an array via the memory layer 222 (systolic array configuration), for example, as in an arithmetic circuit 22d shown in FIG. An example of the arithmetic circuit 22d shown in FIG. 10 is also disclosed in Non-Patent Document 1. Even in such a configuration, the operation accuracy of the circuit is determined by the bit width of the multiplier and the adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).

For example, when the high-efficiency inference processing unit 103a is realized by the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d shown in FIGS. It is only necessary that the calculation by the calculator corresponds to the low calculation accuracy (first calculation accuracy).

On the other hand, for example, when the high-precision inference processing unit 103b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, the calculation by each variable, the adder, or the multiplier used for the arithmetic in the circuit is performed. It is only necessary to correspond to high calculation accuracy (second calculation accuracy).

Next, some examples of the configuration of a parameter updating circuit which is an implementation example of at least the high-precision parameter updating processing unit 104b will be described. The high-precision parameter update processing unit 104b, for example, for each parameter in each unit of the specified layer, for each parameter in the specified unit or the specified parameter, sets an objective function such as an error function that includes the parameter as an adjustment parameter. A parameter updating process for solving the optimization problem and updating the adjustment parameter may be performed with a predetermined high calculation accuracy, and the updated value may be output. At that time, the high-precision parameter update processing unit 104b may receive the value of the variable (which may include the value of the parameter before updating) used in solving the optimization problem as a parameter, and perform the above processing. Hereinafter, the operation performed in the parameter update processing may be referred to as a parameter update operation.

Hereinafter, a circuit for performing the parameter update operation is referred to as a “parameter update circuit”, and in particular, a circuit for performing the thought learning operation with higher operation accuracy than the operation accuracy of the inference operation performed by the high-efficiency inference processing unit 103a. This is called a “high-precision parameter update circuit”. Note that, in order to distinguish from the high-precision parameter updating circuit, a circuit for performing a parameter updating operation with the same operation accuracy as the inference operation performed by the high-efficiency inference processing unit 103a is referred to as a “high-efficiency parameter updating circuit”. There is. The above-described high-efficiency parameter update processing unit (not shown) may be realized by such a high-efficiency parameter update circuit.

The configuration of the parameter updating circuit described below can be realized irrespective of whether the parameter updating operation is performed with high accuracy or with low accuracy. In other words, even if the difference between the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b is only the accuracy of each variable, adder, or multiplier used in the operation in the arithmetic circuit that implements the operation of the processing unit. Good.

The simplest example of the parameter updating circuit has a configuration including one multiplier / adder (MAC) 221 in which a multiplier and an adder are combined similarly to the inference circuit (the arithmetic circuit 22a in FIG. 7 (b) MAC221 etc.). The parameter updating circuit can also be realized by, for example, the

arithmetic circuits

22b, 22c, and 22d shown in FIGS. That is, the arithmetic circuits shown in FIGS. 7 to 10 are also examples of arithmetic circuits for parameter update arithmetic.

For example, when the high-precision parameter update processing unit 104b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, the calculation by each variable, the adder, and the multiplier used in the arithmetic in the circuit is high. It is only necessary to correspond to the calculation accuracy (second calculation accuracy). At this time, it is not necessary that all the variables, addition and multiplication have the same precision, and the accuracy of each variable, addition and multiplication used for the parameter update operation in the circuit realizes the high-efficiency inference processing unit 103a. It is only required that the accuracy of each variable used in the inference operation, addition and multiplication in the arithmetic circuit be higher than that of any of the addition and the multiplication.

On the other hand, for example, when the high-efficiency parameter update processing unit 104a is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, each variable used for the arithmetic in the circuit, the arithmetic by the adder and the multiplier are used. Should correspond to low operation accuracy (first operation accuracy).

[motion]
Next, the operation of the learning device 100 of the present embodiment will be described. FIG. 11 is a flowchart illustrating an example of the operation of the learning device 100 according to the present embodiment. The operation illustrated in FIG. 11 is performed based on, for example, control by the control unit 105.

In the example shown in FIG. 11, first, the control unit 105 reads the pre-learning model from the pre-learning model storage unit 101 and also reads the learning data from the learning data storage unit 102 (step S11).

Next, the control unit 105 controls the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b as necessary to sequentially perform inference processing on each unit included in all of the first to nth layers. (Step S12: forward propagation). At this time, the control unit 105 causes the high-efficiency inference processing unit 103a to perform inference processing of at least some of the units. The control unit 105 may cause the high-efficiency inference processing unit 103a to perform inference processing for all units, or may cause the high-efficiency inference processing unit 103a to perform inference processing for some units. When causing the high-efficiency inference processing unit 103a to perform the inference processing of some units in the forward propagation, the control unit 105 may cause the high-precision inference processing unit 103b to perform the inference processing of the remaining units.

(4) The high-efficiency inference processing unit 103a and the high-accuracy inference processing unit 103b execute inference processing for a specified layer or unit in accordance with an instruction from the control unit 105.

Next, the control unit 105 controls the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b as necessary, and for a predetermined parameter among the parameters for calculating the output of the unit of each layer, A parameter update process is performed (step S13: parameter update process). At this time, the control unit 105 causes the high-precision parameter update processing unit 104b to perform a parameter update process on at least some of the parameters. The control unit 105 may cause the high-precision parameter update processing unit 104b to perform parameter update processing for all parameters, or may cause the high-precision parameter update processing unit 104b to perform parameter update processing for some parameters. Is also good. When causing the high-precision parameter update processing unit 104b to perform only the parameter update processing of some of the parameters in the parameter update processing, the control unit 105 causes the high-efficiency parameter update processing unit 104a to perform all the parameter update processing of the remaining parameters. The processing may be performed, or a part of the remaining parameters may be updated by the high-efficiency parameter update processing unit 104a. In the latter case, the parameter update processing itself is omitted for some parameters.

(4) The high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b execute the parameter update processing of the designated parameter according to the instruction from the control unit 105.

Finally, the control unit 105 stores the learned model including the parameter updated in step S13 in the learned model storage unit 107 (step S14).

As another variation of the above operation, for example, when a plurality of pieces of learning data are held, the operations of steps S11 to S14 may be repeated for the number of pieces of learning data. In this case, the learned model as a learning result for the immediately preceding learning data is used as a pre-learning model of learning for the next learning data.

In addition, for example, when a plurality of pieces of learning data are held, the operations of steps S12 to S13 can be repeatedly performed for the number of pieces of learning data.

Further, regardless of the number of learning data, it is also possible to repeat the above-described operation of step S11 to step S14 or the operation of step S12 to step S14 a plurality of times using the same learning data (epoch). processing).

Further, in the forward propagation in step S12, for example, a range (low-precision inference range) in which inference processing is performed with low calculation accuracy is not only determined in advance, but also can be specified by the user, or can be specified for each learning data or epoch. It can be changed every time the processing is repeated.

In the parameter update processing in step S13, for example, the range in which the parameter update processing is performed with high calculation accuracy (high-precision parameter update range) may be limited to only the fully connected layer. In addition, for example, a high-precision parameter update range, a range in which parameter update processing is performed with low calculation accuracy (low-precision parameter update range), and a range in which parameter update processing is not performed can be specified in advance as well as specified by the user. Or it can be changed at each processing (each learning data or each repetition of the epoch processing).

FIGS. 12 and 13 are flowcharts showing more specific operation examples of the learning device 100 of the present embodiment. The operation examples shown in FIGS. 12 and 13 are examples in which the operation of each step is illustrated by focusing on the hardware configuring the learning device 100. The hardware configuration was the configuration shown in FIG.

In the example shown in FIG. 12, first, the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103a reads the learning data and the pre-learning model from the memory 13 in response to an instruction from the control device 14 as the control unit 105. (Step S111).

Next, the low-precision arithmetic circuit 11 converts a part of forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the first to (k−1) th layers) with low arithmetic accuracy. (Step S112). Then, the low-precision arithmetic circuit 11 stores the arithmetic result of step S112 (in this example, the output from each unit of the (k-1) th layer) in the memory 13 (step S113).

In this example, it is assumed that the pre-learning model is a neural network having a multilayer structure of n + 1 layers from the 0th layer to the nth layer, with the input layer being the 0th layer and the output layer being the nth layer. The (k-1) th layer is an intermediate layer that is downstream of the input layer (0th layer) and upstream of the output layer (nth layer). That is, k is an integer satisfying 0 <k-1 <n.

Next, the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads the operation result (output from each unit of the (k-1) th layer) stored in step S113 according to the instruction of the control device 14 ( Step S211).

Then, the high-precision arithmetic circuit 12 performs the continuation of forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the k-th layer to the n-th layer) with high arithmetic accuracy ( Step S212).

Next, the high-precision arithmetic circuit 12 serving as the high-precision parameter update processing unit 104b is configured to include, in accordance with an instruction from the control device 14, each of the layers included in some of the layers (the k-th to n-th layers in this example). A parameter update operation for updating a parameter (such as a connection weight with another unit) in the unit is performed with high operation accuracy (step S212). Then, the high-precision arithmetic circuit 12 stores the arithmetic result of step S212 (in this example, updated parameters in each unit included in each of the k-th layer to the n-th layer) in the memory 13 (step S213).

The updated parameter stored as the calculation result in step S213 corresponds to the learned model described above.

In the example shown in FIG. 12, first, the low-precision arithmetic circuit 11 performs inference processing on some layers as the high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 executes the high-precision parameter update processing unit 104b. This is an operation example of performing inference processing and parameter update processing for the remaining layers.

In the example shown in FIG. 13, first, the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103 a stores the learning data and the pre-learning model in the memory 13 in accordance with an instruction from the control device 14 as the control unit 105. (Step S121).

Next, the low-precision arithmetic circuit 11 performs forward propagation (in this example, an inference operation for calculating the output of each unit included in each of the first to nth layers) with low arithmetic accuracy (step S122). . Then, the low-precision arithmetic circuit 11 stores the arithmetic result of step S122 (in this example, the output from the unit of the nth layer which is the output layer) in the memory 13 (step S123).

Note that, also in this example, the pre-learning model is a neural network having a multilayer structure of (n + 1) th layers from the 0th layer to the nth layer, with the input layer being the 0th layer and the output layer being the nth layer.

Next, the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads out the operation result (output from the unit of the n-th layer which is the output layer) stored in step S123 according to the instruction of the control device 14. (Step S221).

Next, the high-precision arithmetic circuit 12 responds to an instruction from the control device 14 to set parameters (in other words, the k-th layer to the n-th layer) in each unit included in some layers (the k-th layer to the n-th layer). A parameter update operation for updating the connection weight) is performed with high calculation accuracy (step S222). Then, the high-precision arithmetic circuit 12 stores the arithmetic result of step S222 (in this example, updated parameters in each unit included in each of the k-th layer to the n-th layer) in the memory 13 (step S223).

The updated parameter stored as the calculation result in step S223 corresponds to the learned model described above.

In the example shown in FIG. 13, the low-precision arithmetic circuit 11 performs inference processing on all layers as a high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 performs a high-precision parameter update processing unit 104b. This is an operation example of performing parameter update processing for some layers.

Note that, after step S213 in FIG. 12 or step S223 in FIG. 13, the low-precision arithmetic circuit 11 may further perform the operation shown in FIG. 14 as the high-efficiency parameter update processing unit 104a.

That is, the low-precision arithmetic circuit 11 reads out updated parameters in the units included in the k-th layer to the n-th layer stored in the memory 13 as the high-efficiency parameter update processing unit 104a (step S231). .

Next, the low-precision arithmetic circuit 11 calculates parameters (such as connection weights with other units) in each unit included in the remaining layers (in this example, the first to (k-1) th layers). A parameter update operation for updating is performed with low operation accuracy (step S232). Then, the low-precision arithmetic circuit 11 saves the arithmetic result of step S232 (in this example, updated parameters in each unit included in each of the first to (k-1) th layers) in the memory 13 ( Step S233).

In the case of this example, the updated parameters stored as the calculation results in step S213 or S223 and the updated parameters stored as the calculation results in step S233 correspond to the learned model described above.

The operations shown in FIGS. 12 to 14 are examples of learning processing for one learning data. Therefore, when a plurality of pieces of learning data are held, it is possible to repeat the above-described operation and the respective operation steps included in the above-described operations for the number of pieces of learning data. Also, regardless of the number of learning data, it is also possible to repeat the above operation or each operation step included in the above operation a plurality of times using the same learning data (epoch process). Further, the k-th layer to the n-th layer, which are the high-precision parameter update ranges in the above operation, may be fully connected layers, or k may be specified by the user or changed every time processing is performed. .

As described above, according to the present embodiment, the calculation processing of the learning algorithm is divided into inference processing and parameter update processing, at least a part of the inference processing is calculated with low calculation accuracy, and at least one of the parameter update processing is performed. By operating the unit with high operation accuracy, an operation part requiring high operation accuracy can be optimized, so that it is possible to perform learning with sufficient accuracy while reducing power consumption.

Embodiment 2. FIG.
Next, a second embodiment of the present invention will be described. FIG. 15 is a block diagram illustrating a configuration example of a main part of the data processing device according to the second embodiment. The data processing device 300 illustrated in FIG. 15 includes a low-precision arithmetic processing unit 31, a high-precision arithmetic processing unit 32, a communication path 33, and a data conversion unit 34.

The low-precision calculation processing unit 31 is a processing unit that performs a predetermined calculation with relatively low calculation accuracy. Here, the relatively low calculation accuracy may be any calculation accuracy that is lower than the calculation accuracy of the calculation performed by the high-precision calculation processing unit 32.

The high-precision operation processing unit 32 is a processing unit that performs a predetermined operation with relatively high operation accuracy. Here, the relatively high calculation accuracy may be any calculation accuracy that is higher than the calculation accuracy of the calculation performed by the low-precision calculation processing unit 31.

The low-precision arithmetic processing unit 31 may be, for example, the high-efficiency inference processing unit 103a or the high-efficiency parameter update processing unit 104a. The high-precision arithmetic processing unit 32 may be, for example, the high-precision inference processing unit 103b or the high-precision parameter update processing unit 104b. Also in the present embodiment, a measure of the width and fineness of the range of numerical data used for the operation actually performed in the data processing performed by the low-precision operation processing unit 31 and the high-precision operation processing unit 32 (more specifically, The scale of the range or fineness of the range of numeric data determined by the handling of the bit width and the decimal point in the arithmetic circuit that implements the processing unit) is referred to as “accuracy” or “operation accuracy”.

In this example, the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 are connected via a communication path 33 and a data conversion unit 34. The data conversion unit 34 is provided between the high-precision arithmetic processing unit 32 and the communication path 33.

The communication path 33 may be realized by, for example, a bus. Note that the communication path 33 may be realized by a connection circuit (Inter-connect) provided inside the chip. Further, the communication path 33 may include not only a bus and a connection circuit but also a memory (such as an external memory and a buffer) connected to the bus and the connection circuit.

The data conversion unit 34 performs a predetermined conversion process on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32. At this time, the data conversion by the data conversion unit 34 is performed, for example, in data communication of data with a calculation accuracy in which the data amount (communication amount per data) becomes smaller in communication (data exchange) performed on the communication path 33. It is done to become.

For example, the data conversion unit 34 converts each data passing through the communication path 33 into data having a smaller data amount among the calculation accuracy of the low-precision calculation processing unit 31 and the calculation accuracy of the high-precision calculation processing unit 32. Thus, the transmission and reception data are converted. Note that if the data amount is the same, the transmission / reception data is converted so that the data has lower calculation accuracy. In the configuration shown in FIG. 15, the data conversion unit 34 may perform data conversion such that each data passing through the communication path 33 becomes data of the calculation accuracy of the low-precision calculation processing unit 31. By adjusting to the lower calculation accuracy, the data conversion on the low-precision calculation processing unit 31 side can be eliminated while minimizing the deterioration of the calculation accuracy due to the data exchange.

Here, the data conversion includes a type conversion that matches a data type with a data type with lower operation accuracy of the processing unit that is a communication end point, and a data compression (particularly, a numerical sequence compression, a reduction in the number of digits, and the like). Numerical data compression) and the synthesis of two or more converted data.

The data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32 via the communication path 33, and converts the received data (low-operation-accuracy data) to high data. The data is converted into data of the calculation accuracy (high calculation accuracy) handled by the precision calculation processing unit 32 and passed to the high-precision calculation processing unit 32. The data conversion unit 34 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic processing data) into a low-precision arithmetic processing unit. The data is converted into data of a calculation accuracy (low calculation accuracy) handled by 31 and transmitted to the communication channel 33.

For example, the operation precision (here, the data type of the numerical value used in the operation) of the low-precision operation processing unit 31 is an integer 16 bits (INT16), and the operation accuracy of the high-precision operation processing unit 32 is a floating-point 32-bit (FP32). ), The data conversion unit 34 may perform data conversion so that the data passing through the communication path 33 is 16-bit integer data. For example, when the operation precision of the low-precision operation processing unit 31 is 16-bit integer (INT16) and the operation accuracy of the high-precision operation processing unit 32 is 16-bit floating point (FP16), the data conversion unit 34 Data conversion may be performed so that data passing through the communication path 33 becomes integer 16-bit data.

The communication of the data passing through the communication path 33 is a one-way communication (for example, only the transmission from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32, and the transmission from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31). Transmission only). In this case, the data conversion unit 34 only needs to perform data conversion corresponding to communication actually performed.

In the present embodiment, the data conversion unit 34 may be realized by a dedicated data conversion circuit that performs data conversion designed according to the configuration and operation of the data processing device 300. By mounting the data conversion unit 34 in a dedicated circuit, it is possible to omit processing for generalization such as reading of set values and states and branching according to them, and further efficiency can be achieved. Also, by dedicating data conversion, it is possible to easily implement conversion processing of multiple data at once, or perform data conversion of multiple data in parallel, collect the results, and transmit them collectively. Further, efficiency can be further improved. Here, the parallel processing of the data conversion and the compilation of the results are one of the data conversion examples in which the conversion of each data and the synthesis of the converted data are combined. The data conversion unit 34 may realize such data conversion by, for example, a SIMD (Single instruction multiple data) operation.

FIG. 16 is a block diagram showing another configuration example of the data processing device of the second embodiment. In the example illustrated in FIG. 15, the data conversion unit 34 is provided only on the high-precision calculation processing unit 32 side, but the data conversion unit may be provided on the low-precision calculation processing unit 31 side. The data processing device 300 shown in FIG. 16 is different from the configuration shown in FIG. 15 in that a data conversion unit 35 is further provided between the low-precision arithmetic processing unit 31 and the communication path 33. That is, in this example, the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 are connected via the data conversion unit 35, the communication path 33, and the data conversion unit 34.

The data conversion unit 35 performs a predetermined conversion process on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32.

In the present example, the data conversion by the data conversion unit 34 and the data conversion unit 35 is performed in the communication (data exchange) performed on the communication path 33 in a data communication performed by the low-precision calculation processing unit 31 with the data amount at the calculation accuracy handled by the low-precision calculation processing unit 31. The data communication is performed so that the data amount is smaller than the data amount of the operation accuracy.

For example, the data conversion unit 34 and the data conversion unit 35 calculate the operation accuracy (hereinafter, ultra-low operation accuracy) in which each data passing through the communication path 33 is smaller than the data communication amount performed by the operation accuracy of the low-precision operation processing unit 31. The transmission / reception data is converted so that the data becomes

In the present example, the data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 to the high-precision arithmetic processing unit 32 via the data conversion unit 35 and the communication path 33, and The data (the data of the ultra-low operation accuracy after conversion by the data conversion unit 35) is converted into data of the operation accuracy (high operation accuracy) handled by the high-accuracy operation processing unit 32, and is passed to the high-accuracy operation processing unit 32. The data conversion unit 34 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic data) to ultra-low-precision arithmetic. The data is converted into data and transmitted to the communication channel 33.

Further, the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 via the data conversion unit 34 and the communication path 33, and receives the received data ( The low-precision arithmetic processing unit 31 converts the ultra-low-operation-precision data (converted by the data conversion unit 34) into data of low-precision arithmetic processing (low-operation accuracy) handled by the low-precision arithmetic processing unit 31. Further, the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 to the low-precision arithmetic processing unit 31 and converts the received data (high-precision arithmetic data) to ultra-low-precision arithmetic. The data is converted into data and transmitted to the communication channel 33.

For example, the operation precision (here, the data type of the numerical value used in the operation) of the low-precision operation processing unit 31 is an integer 16 bits (INT16), and the operation accuracy of the high-precision operation processing unit 32 is a floating-point 32-bit (FP32). ), The data conversion unit 34 and the data conversion unit 35 convert the data passing through the communication path 33 into an integer 12 bits (INT12) or an integer 8 bits (INT8) having a data amount smaller than INT16. It may be compressed. When performing data compression, the data conversion unit 34 and the data conversion unit 35 perform numerical data compression (for example, reduction of lower bits) that reduces only the accuracy so that the data does not lose its meaning as numerical data.

The data conversion unit 34 and the data conversion unit 35 can also perform data compression using the feature of the activation function in deep learning. For example, if a step function, which is one of the activation functions, is used, data can be compressed to 1 bit. If ReLU (ramp function) is used, the number of code bits of data can be reduced.

Further, when performing data conversion for reducing the number of bits, the data conversion unit 34 and the data conversion unit 35 pack together a plurality of pieces of data having an odd number of bits, or combine a plurality of pieces of data thus collected into a plurality of pieces of data. A process of decomposing the data (pack / unpack process) may be performed. The efficiency of the pack / unpack processing can be improved by specializing the pack / unpack processing.

(4) By thus reducing the amount of data passing through the communication path 33, data exchange between cores (arithmetic circuits) having different arithmetic precisions can be speeded up. Furthermore, when data exchange between cores is performed via a memory, the amount of memory used for data exchange can be reduced, and thus power consumption for memory use can be reduced.

When there are two or more combinations of cores having different computational accuracy for performing data exchange, the data conversion unit 34 or the data conversion unit is provided for each combination at one or both end points of the communication path of the inter-core communication in the combination. What is necessary is just to provide the part 35.

Next, the outline of the present invention will be described. FIG. 17 is a block diagram showing an outline of the data processing device of the present invention. The data processing device 500 shown in FIG. 17 includes a low-precision arithmetic processing unit 501, a high-precision arithmetic processing unit 502, and a first data conversion unit 504.

(4) The low-precision arithmetic processing unit 501 (for example, the low-precision arithmetic processing unit 31) performs a predetermined arithmetic operation with the first accuracy.

(4) The high-precision arithmetic processing unit 502 (for example, the high-precision arithmetic processing unit 32) performs a predetermined arithmetic operation at a second accuracy higher than the first accuracy.

The first data conversion unit 504 (for example, the data conversion unit 34) is a high-precision arithmetic processing unit of the communication path 503 for transferring data between the high-precision arithmetic processing unit 502 and the low-precision arithmetic processing unit 501. It is provided at the end point on the 502 side.

The first data conversion unit 504 is configured so that the data transferred to and from the high-precision arithmetic processing unit 502 at the connection destination is data that can be handled by the high-precision arithmetic processing unit 502, and that the amount of data passing through the communication path 503 is The data passes between the communication path 503 and the high-precision arithmetic processing means 502 so that the data amount becomes equal to or less than the data amount when the data of the accuracy of 1 is used, and the accuracy of the data passing through the communication path 503 is equal to or less than the first accuracy. A predetermined conversion is performed on the data.

(4) With such a configuration, efficiency can be improved even in a process in which an operation requiring high accuracy and an operation not requiring high accuracy are mixed.

FIG. 18 is a configuration diagram showing a configuration example of the data processing circuit of the present invention. The data processing circuit 600 illustrated in FIG. 18 includes a low-precision arithmetic circuit 601, a high-precision arithmetic circuit 602, and a first data conversion circuit 604.

(4) The low-precision arithmetic circuit 601 (for example, the low-precision arithmetic processing unit 31 or the low-precision arithmetic circuit 11) performs a predetermined arithmetic operation with the first accuracy.

(4) The high-precision operation circuit 602 (for example, the high-precision operation processing unit 32 or the high-precision operation circuit 12) performs a predetermined operation at a second accuracy higher than the first accuracy.

The first data conversion circuit 604 (for example, the data conversion unit 34) is provided on the high-precision arithmetic circuit 602 side of the communication path 603 for transferring data between the high-precision arithmetic circuit 602 and the low-precision arithmetic circuit 601. A predetermined conversion is performed on data that is provided at an end point and passes between the communication path 603 and the high-precision arithmetic circuit 602.

In such a data processing circuit 600, data passed between the first data conversion circuit 604 and the high-precision arithmetic circuit 602 to be connected is data handled by the high-precision arithmetic circuit 602 and passes through the communication path 603. The data amount is equal to or less than the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy.

(4) Even with such a configuration, it is possible to efficiently perform processing in which arithmetic operations requiring high precision and arithmetic operations not requiring high accuracy are mixed.

As shown in FIG. 19, the data processing circuit 600 is further provided at an end point of the communication path 603 on the side of the low-precision arithmetic circuit 601, and is provided for data passing between the communication path 603 and the low-precision arithmetic circuit 601. And a second data conversion circuit 605 for performing predetermined conversion.

In such a data processing circuit 600, the data passed between the second data conversion circuit 605 and the low-precision arithmetic circuit 601 to be connected is the data handled by the low-precision arithmetic circuit 601 and the communication path. A configuration in which the amount of data passing through the communication path 603 is smaller than the amount of data when the data of the first accuracy is used, and the accuracy of data passing through the communication path 603 is lower than the first accuracy.

According to such a configuration, it is possible to further improve the efficiency of the processing in which the operation requiring high precision and the operation not requiring high precision are mixed.

Note that the above embodiment can also be described as the following supplementary notes.

(Supplementary Note 1) Low-precision calculation processing means for performing a predetermined calculation with a first precision, high-precision calculation processing means for performing a predetermined calculation with a second precision higher than the first precision, and high-precision calculation processing means And a first data conversion means provided at an end of the communication path on the high-precision processing means side for transferring data between the low-precision processing means and the low-precision processing means. The data passed to and from the high-precision arithmetic processing means is data that can be handled by the high-precision arithmetic processing means, and the data amount passing through the communication path is equal to or less than the data amount when the first precision data is used. And performing a predetermined conversion on data passing between the communication path and the high-precision arithmetic processing means so that the accuracy of data passing through the communication path is equal to or less than the first accuracy. apparatus.

(Supplementary Note 2) The first data conversion means receives data passed from the low-precision processing means to the high-precision processing means as first-precision data from the communication channel, and receives the received first-precision data. The first data conversion means converts the data into data having a precision that can be handled by the high-precision processing means, and accepts the data passed from the high-precision processing means to the low-precision processing means as the data that can be handled by the high-precision processing means 2. The data processing apparatus according to claim 1, wherein the received data is converted into data having an accuracy that can be handled by the low-precision arithmetic processing means.

(Supplementary Note 3) The apparatus further includes a second data conversion means provided at an end point of the communication path on the low-precision processing means side, wherein the first data conversion means and the second data conversion means are connected to the processing processing means at the connection destination. The data passed between the communication paths is data that can be handled by the processing means of the connection destination, and the amount of data passing through the communication path is smaller than the amount of data when the first precision data is used. The data processing apparatus according to

Supplementary note

1 or 2, wherein predetermined conversion is performed on data passing between the communication path and the processing means connected to the communication path so that the accuracy of the passing data is lower than the first accuracy. .

(Supplementary Note 4) The first data conversion means and the second data conversion means transmit the data passed from the connection processing means to the other processing means from the communication path by a predetermined accuracy lower than the first accuracy. The first data conversion unit and the second data conversion unit receive the data of the third accuracy and convert the received data of the third accuracy into data of an accuracy that can be handled by the arithmetic processing unit of the connection destination. The data passed from the arithmetic processing means of the other party to the arithmetic processing means of the other party is received as it is as data that can be handled by the arithmetic processing means of the connection destination, and the received data is converted into data of accuracy that can be handled by the arithmetic processing means of the other party A data processing device according to claim 1.

(Supplementary Note 5) At least one of the first data conversion means and the second data conversion means, when transmitting the converted data to the communication path, collectively transmits the plurality of converted data, and At least one of the data conversion unit and the second data conversion unit receives the plurality of converted data collected from the communication channel, decomposes the received plurality of converted data, and The data processing apparatus according to

Supplementary Note

3 or 4, wherein each data is converted into data having an accuracy that can be handled by the processing means connected to the connection destination.

(Supplementary note 6) The data processing device according to any one of Supplementary notes 3 to 5, wherein conversions performed by the first data conversion unit and the second data conversion unit are predetermined and fixed.

(Supplementary Note 7) The data processing device is a learning device that learns a predetermined discriminant model composed of two or more units connected in layers, and when learning data is input, each unit of the discriminant model is Learning means for performing inference processing for calculating the outputs of the units in a predetermined order, and parameter updating processing for updating at least a part of the parameters used for calculating the output of each unit based on the result of the inference processing. The learning means as the low-precision arithmetic processing means, a high-efficiency inference means for performing a specified operation of the operations performed in the inference processing with a first arithmetic accuracy, and the high-precision arithmetic processing means A high-precision parameter update in which a specified operation among the operations performed in the parameter update process is performed with a second operation accuracy higher than the first operation accuracy. The data processing apparatus according to any one of Appendices 6 Appendixes 1 and means.

(Supplementary Note 8A) A low-precision calculation circuit that performs a predetermined calculation with a first precision, a high-precision calculation circuit that performs a predetermined calculation with a second precision higher than the first precision, a high-precision calculation circuit, and a low-precision calculation circuit A predetermined conversion is provided for data passing between the communication path and the high-precision arithmetic circuit, which is provided at an end point on the high-precision arithmetic circuit side of a communication path for transferring data to and from the arithmetic circuit. A first data conversion circuit for performing the processing, wherein data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit, and the amount of data passing through the communication path. Wherein the data amount is equal to or less than the data amount when using data of the first accuracy, and the accuracy of data passing through the communication path is equal to or less than the first accuracy.

(Supplementary Note 8B) A second data conversion circuit, which is provided at an end point of the communication path on the low-precision calculation circuit side and performs predetermined conversion on data passing between the communication path and the low-precision calculation circuit, is further provided. The data passed between the second data conversion circuit and the low-precision arithmetic circuit to be connected is data handled by the low-precision arithmetic circuit, and the amount of data passing through the communication path is the first precision data. The data processing circuit according to attachment 8B, wherein the data amount is smaller than the data amount when used and the accuracy of data passing through the communication path is lower than the first accuracy.

(Supplementary Note 9) Transfer of data between low-precision arithmetic processing means for performing a predetermined arithmetic operation at the first accuracy and high-precision arithmetic processing means for performing the predetermined arithmetic operation at a second accuracy higher than the first accuracy The first data conversion means provided at the end of the communication path for performing the high-precision processing on the side of the high-precision processing means can handle the data transferred to and from the high-precision processing means at the connection destination. Communication channel so that the amount of data passing through the communication channel is equal to or less than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication channel is equal to or less than the first accuracy. A data conversion method for performing predetermined conversion on data passing between the data processing means and the high-precision processing means.

(Supplementary Note 10) The first data conversion means is configured such that the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is 1 for data passing between the communication path and the high-precision arithmetic processing means so that the data amount is smaller than the data amount when using data of 1 precision and the precision of data passing through the communication path is lower than the first precision. The second data converter provided at the end point of the communication path on the side of the low-precision arithmetic processing means performs low-precision arithmetic processing on the data transferred to and from the low-precision arithmetic processing means at the connection destination. Means that can be handled by the means, and the amount of data passing through the communication path is smaller than the amount of data when using data of the first accuracy, and the accuracy of data passing through the communication path is lower than the first accuracy. , Communication channel and The data processing method according to supplementary note 9 to perform a predetermined conversion on the data passing between the precision processing means.

Although the present invention has been described with reference to the exemplary embodiments and examples, the present invention is not limited to the exemplary embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

The present invention is not limited to deep learning, and is preferably used in a device that performs a process in which an operation that requires high precision and an operation that does not require high accuracy are mixed, while performing the process while suppressing power consumption. Applicable.

REFERENCE SIGNS LIST 10 arithmetic circuit 11 low-precision arithmetic circuit 12 high-precision arithmetic circuit 13 memory 14 control device 15 bus 51 unit 52 unit connection 53 inference process 54 parameter update process 100 learning device 101 pre-learning model storage unit 102 learning data storage unit 103a high Efficiency inference processing unit 103b High-precision inference processing unit 104a High-efficiency parameter update processing unit 104b High-precision parameter update processing unit 105 Control unit 106 Learning processing unit 107 Model storage unit after learning 1000 Computer 1001 CPU
1002 main storage device 1003 auxiliary storage device 1004 interface 1005 display device 1006 input device 1007 GPU
1008 Processor 21

Bus

22a, 22b, 22c, 22d Arithmetic circuit 221 MAC
222 memory layer 223 squared addition tree 224 ALU
Reference Signs List 300 data processing device 31 low-precision arithmetic processing unit 32 high-precision arithmetic processing unit 33

communication channel

34, 35 data conversion unit 500 data processing device 501 low-precision arithmetic processing unit 502 high-precision arithmetic processing unit 503 communication channel 504 first data conversion Means 600 Data processing circuit 601 Low-precision arithmetic circuit 602 High-precision arithmetic circuit 603 Communication path 604 First data conversion circuit 605 Second data conversion circuit 90 Large-scale learning circuit

Claims

Low-precision arithmetic processing means for performing a predetermined arithmetic operation with a first accuracy;
High-precision operation processing means for performing a predetermined operation at a second accuracy higher than the first accuracy;
A first data conversion means provided at an end of the communication path for performing data transfer between the high-precision processing means and the low-precision processing means on the high-precision processing means side;
The first data conversion means is configured such that data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is Between the communication path and the high-precision arithmetic processing means such that the data amount is equal to or less than the data amount when using the first accuracy data, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. A data processing device for performing a predetermined conversion on data passing through the data processing device.
The first data conversion means receives, as the first precision data, data passed from the communication path to the high precision calculation processing means from the low precision calculation processing means, and receives the received first precision data. Convert the data into data with accuracy that can be handled by the high-precision arithmetic processing means,
The first data conversion means receives data passed from the high-precision arithmetic processing means to the low-precision arithmetic processing means as data which can be handled by the high-precision arithmetic processing means, and converts the received data to the low-precision arithmetic processing. The data processing device according to claim 1, wherein the data is converted into data having an accuracy that can be handled by a processing unit.
A second data converter provided at an end point of the communication path on the low-precision processor side;
The first data conversion unit and the second data conversion unit are configured so that the data passed to and from the connection processing unit is data that can be handled by the connection processing unit, and the communication path is The communication path and the communication path are set so that the amount of data passing therethrough is smaller than the amount of data when the data of the first accuracy is used, and the accuracy of data passing through the communication path is lower than the first accuracy. The data processing apparatus according to claim 1, wherein a predetermined conversion is performed on data passing between the connection destination arithmetic processing means.
The first data conversion unit and the second data conversion unit are configured to transmit, from the communication path, data passed from the connection processing unit to the partner processing unit, the predetermined data being lower than the first accuracy. And converting the received third accuracy data into accuracy data that can be handled by the connected processing means.
The first data conversion unit and the second data conversion unit receive data passed from the connection processing unit to the partner processing unit as data that can be handled by the connection processing unit. 4. The data processing apparatus according to claim 3, wherein the received data is converted into data having an accuracy that can be handled by the processing means of the other party.
At least one of the first data conversion unit and the second data conversion unit, when transmitting the converted data to the communication path, collectively transmits a plurality of converted data,
At least one of the first data conversion unit and the second data conversion unit receives the plurality of converted data items collected from the communication path, and decomposes the received plurality of converted data items. The data processing apparatus according to claim 3, wherein the decomposed data is converted into data having an accuracy that can be handled by an arithmetic processing unit at a connection destination.
The data processing device according to any one of claims 3 to 5, wherein the conversions performed by the first data conversion unit and the second data conversion unit are predetermined and fixed.
The data processing device is a learning device that learns a predetermined discriminant model composed of two or more units combined in layers.
When learning data is input, inference processing for calculating the output of each unit of the discriminant model in a predetermined order, and at least parameters used for calculating the output of each unit based on the result of the inference processing. A learning means for performing parameter update processing for partially updating the
The learning means,
High-efficiency inference means for performing, with the first operation accuracy, a specified operation among operations performed in the inference process, as the low-precision operation processing means;
The high-precision arithmetic processing unit includes a high-precision parameter updating unit that performs a specified operation among operations performed in the parameter updating process with a second operation accuracy higher than the first operation accuracy. The data processing device according to claim 1.
A low-precision arithmetic circuit that performs a predetermined operation with a first accuracy;
A high-precision operation circuit that performs a predetermined operation at a second accuracy higher than the first accuracy;
A communication path for transferring data between the high-precision arithmetic circuit and the low-precision arithmetic circuit is provided at an end point on the high-precision arithmetic circuit side, and a communication path between the communication path and the high-precision arithmetic circuit is provided. A first data conversion circuit that performs a predetermined conversion on the passing data;
Data passed between the first data conversion circuit and the high-precision arithmetic circuit to be connected is data handled by the high-precision arithmetic circuit,
The data amount passing through the communication path is equal to or less than the data amount when the data of the first accuracy is used, and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. Data processing circuit.
To transfer data between a low-precision operation processing unit that performs a predetermined operation at a first accuracy and a high-precision operation processing unit that performs a predetermined operation at a second accuracy higher than the first accuracy First data conversion means provided at an end point of the communication path on the high-precision arithmetic processing means side,
The data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path uses the first-precision data. A predetermined conversion is performed on the data passing between the communication path and the high-precision arithmetic processing unit so that the data amount becomes equal to or less than the data amount in the case and the accuracy of the data passing through the communication path is equal to or less than the first accuracy. A data processing method characterized by performing:
The first data conversion means, while the data passed to and from the high-precision arithmetic processing means at the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is Between the communication path and the high-precision arithmetic processing means such that the amount of data passing through the communication path is smaller than the data amount when the data of the first precision is used, and the accuracy of data passing through the communication path is lower than the first accuracy. Performs a predetermined transformation on the data passing through
A second data conversion means provided at an end point of the communication path on the side of the low-precision processing means can handle data transferred to and from the low-precision processing means connected to the low-precision processing means. Data and the amount of data passing through the communication path is smaller than the amount of data when using the first precision data, and the precision of data passing through the communication path is lower than the first precision. The data processing method according to claim 9, wherein predetermined conversion is performed on data passing between the communication path and the low-precision arithmetic processing means.