WO2023170855A1

WO2023170855A1 - Neural network device

Info

Publication number: WO2023170855A1
Application number: PCT/JP2022/010523
Authority: WO
Inventors: 督那須; 知嘉子中西
Original assignee: 三菱電機株式会社
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2023-09-14
Also published as: JPWO2023170855A1; JP7179237B1

Abstract

A neural network analysis unit (102) of a neural network device (100) has a network structure analysis unit (201) that analyzes the operation structure of a neural network, and a neural network division unit (202) that, for each operation obtained by dividing the neural network, determines whether to make said operation a circuit or perform software processing of said operation. The neural network division unit (202) has a convolutional layer circuitization unit (501) that groups layers having identical or similar parameters among layers of the neural network where convolutional operations are performed, a circuit scale calculation unit (503) that, for each convolutional operation of the grouped layers, calculates the circuit scale if said convolutional operation were made into a circuit, and a circuitization site determination unit (504) that, on the basis of the circuit scales calculated by the circuit scale calculation unit (503), determines an operation to make into a circuit.

Description

neural network device

The present disclosure relates to artificial intelligence technology, and particularly relates to an apparatus and method for creating a program that processes a neural network.

For example, in fields such as image processing, neural networks are capable of processing with extremely high precision and have been widely used recently. Additionally, neural networks involve a large number of calculations and are known to have a high processing load. In order to complete processing within the desired time, neural networks are realized using dedicated processors such as GPGPU (General Purpose Graphics Processing Unit) and hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Often.

Although a neural network includes a large number of operations, its structure is relatively simple, consisting of a combination of operations such as convolution, activation functions, and full connections. For example, Patent Document 1 listed below proposes a technique that utilizes the characteristics of this neural network to solve the problem of a large amount of calculation. The technology disclosed in Patent Document 1 compiles different network structures into operations of the same arithmetic unit by controlling the same arithmetic unit based on a single instruction corresponding to each layer operation of the multi-layer arithmetic operation of a neural network. The device is capable of realizing logic operations in all layers (for example, Patent Document 1 listed below).

JP2019-139747A

The technology of Patent Document 1 uses hardware resources efficiently by communicating and using a single instruction prepared in advance on hardware according to the network structure, and achieves high-speed neural network processing. I can do it. However, since the processing is performed using a combination of single instructions prepared in advance on the hardware, there is a concern that the processing may not be completed within the desired time.

The present disclosure has been made to solve the above problems, and aims to enable the design of neural network processing that can achieve desired performance (execution time and accuracy) on hardware with fewer resources. purpose.

A neural network device according to the present disclosure includes a neural network analysis unit that determines a calculation method for each calculation that constitutes a neural network, such as whether to convert the calculation into a circuit or process it by software, and a neural network analysis unit that determines a calculation method for each calculation that constitutes the neural network, and a circuit for processing the calculation that has been determined to be circuitized. a neural network calculation method output unit that creates and outputs a program for software processing of circuit information and software processing of calculations determined to be processed by software, the neural network analysis unit comprising: , a network structure analysis unit that analyzes the calculation structure of the neural network, and a neural network division unit that determines whether each calculation obtained by dividing the neural network should be implemented as a circuit or processed by software. The network structure analysis unit classifies each layer of the neural network into a calculation structure classification unit that classifies each layer according to the type of calculation that constitutes the layer, and a layer that performs a convolution calculation by the calculation structure classification unit. a convolution layer analysis unit that specifies parameters of the convolution operation for the layer that has been determined, and the neural network division unit has the same or similar parameters based on the parameters specified by the convolution layer analysis unit. a convolutional layer circuitization unit that groups layers; a circuit scale calculation unit that calculates a circuit scale when the convolutional operation is circuitized for each convolution operation of the layer grouped by the convolutional layer circuitization unit; and a circuitization location determining section that determines an operation to be circuitized based on the circuit scale calculated by the circuit scale calculation section.

According to the present disclosure, it is possible to design neural network processing that achieves desired performance (in-time processing and accuracy) on hardware with fewer resources.

Objects, features, aspects, and advantages of the present disclosure will become more apparent from the following detailed description and accompanying drawings.

1 is a block diagram showing the configuration of a neural network device according to Embodiment 1. FIG. FIG. 2 is a block diagram showing the configuration of a neural network analysis section according to the first embodiment. FIG. 2 is a block diagram showing the configuration of a neural network calculation method output unit according to the first embodiment. FIG. 2 is a block diagram showing the configuration of a network structure analysis unit according to the first embodiment. FIG. 2 is a block diagram showing the configuration of a neural network dividing section according to the first embodiment. 7 is a flowchart illustrating processing of the convolutional layer circuitization unit according to the first embodiment. 7 is a flowchart showing processing of the activation layer circuitization unit according to the first embodiment. 7 is a flowchart illustrating processing of a circuit scale calculation unit according to the first embodiment. FIG. 3 is a diagram showing an example of the hardware configuration of a neural network construction section. FIG. 3 is a diagram showing an example of the hardware configuration of a neural network construction section. FIG. 2 is a block diagram showing the configuration of a neural network device according to a second embodiment.

<Embodiment 1>
FIG. 1 is a block diagram showing the configuration of a neural network device 100 according to the first embodiment. As shown in FIG. 1, the neural network device 100 includes a neural network construction unit 101 having a neural network analysis unit 102, a neural network calculation method output unit 103, and a storage unit 104.

The neural network analysis unit 102 reads the network structure data of the neural network stored in the storage unit 104, analyzes the network structure, determines a calculation method including a program and circuit for operating the neural network, and executes the determined calculation. Output the method. That is, the neural network analysis unit 102 determines the calculation method for each calculation constituting the neural network, such as circuitization or software processing.

Based on the calculation method received from the neural network analysis unit 102, the neural network calculation method output unit 103 outputs program data that runs on a processor such as a CPU (Central Processing Unit) and data for constructing a calculation circuit on an FPGA. Create and output circuit information. That is, the neural network calculation method output unit 103 outputs circuit information for converting into a circuit the calculations determined by the neural network analysis unit 102 to be processed into a circuit, and software processes the calculations determined to be processed by the neural network analysis unit 102 using software. Create a program to do this and output it.

FIG. 2 is a block diagram showing the configuration of the neural network analysis section 102. As shown in FIG. 2, the neural network analysis section 102 includes a network structure analysis section 201 and a neural network division section 202.

The network structure analysis unit 201 analyzes the calculation structure within the network based on the network structure data of the neural network read from the storage unit 104, and outputs the analysis result.

The neural network dividing unit 202 receives the analysis result of the calculation structure in the network input from the network structure analysis unit 201, divides the calculations making up the network structure into calculation units of a predetermined size, and divides the calculations into calculation units of a predetermined size. It is determined whether the processing of each calculation is to be performed by the CPU or by the FPGA (that is, whether to perform software processing or circuitization), and the determination result is output in association with each calculation after division.

FIG. 3 is a block diagram showing the configuration of the neural network calculation method output section 103. As shown in FIG. 3, the neural network calculation method output unit 103 includes a control program creation unit 301, an arithmetic circuit creation unit 302, and a data acquisition circuit control data generation unit 303.

The control program creation unit 301 receives the calculation method for each process of the neural network determined by the neural network analysis unit 102, and creates and outputs a program for the CPU for calculations to be processed by software in the CPU. The program also includes a control program that manages the input/output of an arithmetic circuit running on the FPGA and enables processing of the entire neural network by controlling the FPGA. The control program performs processing such as inputting data to arithmetic circuit A, receiving a calculation result from arithmetic circuit A, and inputting it to arithmetic circuit B, for example. Furthermore, in order to perform part of the calculation process on the CPU instead of on the FPGA, a part of the calculation process may be included in the control program. In that case, the control program performs processing such as calculating the product of the outputs of arithmetic circuit A and arithmetic circuit B, for example. The program output by the control program creation unit 301 is, for example, binary data of an executable program obtained by compiling a code written in C language or the like using a compiler for a specific CPU.

The arithmetic circuit creation unit 302 receives the arithmetic method including the program and circuit for operating the neural network determined by the neural network analysis unit 102, and creates and outputs circuit information for constructing an arithmetic circuit that operates on the FPGA. .

The data acquisition circuit control data generation unit 303 calculates parameters to be provided to a dedicated circuit that receives data from a shared memory or the like, which is used when the FPGA receives data from the CPU, based on circuit information. For example, an arithmetic circuit A on an FPGA receives data of a certain size (data width) and calculates the average of a certain number of data contained therein, and the data of a certain size is stored in the shared memory. If the arithmetic circuit A performs an operation of starting an operation without receiving an operation execution command from the CPU, that is, an operation of autonomously acquiring data, the above parameters enable the operation. This is to pre-incorporate a "fixed size" value for the calculation into the arithmetic circuit on the FPGA.

FIG. 4 is a block diagram showing the components of the network structure analysis unit 201. As shown in FIG. 4, the network structure analysis section 201 includes an arithmetic structure classification section 401, a convolution layer analysis section 402, and an activation layer analysis section 403.

Based on the network structure data of the neural network read from the storage unit 104, the calculation structure classification unit 401 analyzes what kind of calculation each layer in the neural network is composed of, and uses the analysis result as calculation information for each layer. It is associated with each layer and output as . The operation associated with each layer is, for example, a convolution operation or an operation using an activation function. Note that the operations associated with each layer may include operations other than the convolution operation and the activation function. In that case, other calculations may be implemented in a circuit or processed by a CPU. For example, the operations constituting the fully connected layer may be circuitized as they are, or may be treated as convolution operations with large parameters and circuitized as a convolution layer.

The convolutional layer analysis unit 402 receives the calculation information of each layer in the neural network from the calculation structure classification unit 401, and analyzes the layer that performs the convolution calculation among those layers. Specifically, the parameters of the convolution operation performed in each layer that performs the convolution operation are specified. The parameters include, for example, input size, output size, number of kernels, kernel size, BorderMode, and the like.

The activation layer analysis unit 403 receives the calculation information of each layer in the neural network from the calculation structure classification unit 401, and analyzes the layer that performs processing based on the activation function among those layers. Specifically, what activation function is used in each layer that performs processing based on the activation function is specified.

FIG. 5 is a diagram showing the configuration of the neural network dividing section 202. As shown in FIG. 5, the neural network division section 202 includes a convolution layer circuitization section 501, an activation layer circuitization section 502, a circuit scale calculation section 503, and a circuitization location determination section 504.

The convolution layer circuitization unit 501 receives the analysis result by the network structure analysis unit 201, and outputs information related to circuitization of the convolution calculation process.

The activation layer circuitization unit 502 receives the analysis results from the network structure analysis unit 201 and outputs information related to circuitization of the activation function.

The circuit scale calculation unit 503 receives information related to circuitization output from the convolutional layer circuitization unit 501 and the activation layer circuitization unit 502, and calculates the circuit scale when each operation is circuitized.

The circuitization location determination unit 504 receives the information on the circuit scale of each operation calculated by the circuit size calculation unit 503, and separates the operation to be circuitized from the operation to be processed by a processor such as a CPU without circuitization.

FIG. 6 is a flowchart showing the processing of the convolutional layer circuitization unit 501. The operation of the convolutional layer circuitization section 501 will be described below with reference to FIG.

The convolutional layer circuitization unit 501 first obtains information on one convolutional layer from among the calculation information for each layer in the neural network received from the network structure analysis unit 201 (step S601). The convolutional layer circuitization unit 501 then checks the parameters included in the acquired information, associates the parameter information with the convolutional layer, and stores it as convolutional layer information (step S602).

Next, the convolutional layer circuitization unit 501 checks whether there are any convolutional layers whose information has not yet been acquired in the calculation information of each layer in the neural network (step S603). If there remains a convolutional layer for which information has not been acquired, the process returns to step S601, and the convolutional layer circuitization unit 501 performs the processes of steps S601 and S602 for the next convolutional layer.

If the information of all convolutional layers has been acquired, the convolutional layer circuitization unit 501 extracts layers with the same parameters from among the convolutional layers corresponding to the saved convolutional layer information, and extracts the layers with the same parameters. The layers that have the same name are grouped (step S604). In other words, the convolution layer circuitization unit 501 groups the convolution layers according to the parameters they have. Here, the grouping unit is a layer unit, but a product-sum operation that constitutes a convolution operation within a layer may be used as an even smaller grouping unit.

When the grouping in step S604 is completed, the convolutional layer circuitization unit 501 checks whether the number of created groups is greater than or equal to a predetermined number (step S605). If the number of groups is greater than or equal to a predetermined number, the convolutional layer circuitization unit 501 further groups groups having similar parameters (step S606). "Similar" here may be defined as a relationship in which parameters are close to each other within a certain range, or may be defined as an inclusive relationship in which parameters in one group include parameters in the other group. . The reason why groups whose parameters have an inclusive relationship is grouped is that if the containing operation is made into a circuit, the included operation can be executed using part of the circuit of the containing operation. . Note that when saving the grouping results, information about how many times the calculations belonging to each group are used in the entire neural network is associated and saved.

FIG. 7 is a flowchart showing the processing of the activation layer circuitization section 502. The operation of the activation layer circuitization section 502 will be described below with reference to FIG.

The activation layer circuitization unit 502 first obtains the activation function of each layer included in the calculation information of each layer in the neural network received from the network structure analysis unit 201, and stores it as activation function information (step S701). Then, the activation layer circuitization unit 502 extracts the same activation functions from among the saved activation functions, and groups the same activation functions (step S702). In other words, the activation layer circuitization unit 502 groups layers that perform processing based on activation functions for each activation function. Note that when saving the grouping results, information about how many times the calculations belonging to each group are used in the entire neural network is associated and saved.

Next, the activation layer circuitization unit 502 obtains one of the grouped activation functions (step S703), and checks whether the activation function can be linearly approximated (step S704). If the activation function can be linearly approximated, the linearly approximated function is stored in association with the activation function (step S705).

Next, the activation layer circuitization unit 502 checks whether there are any activation functions that have not been obtained yet among the grouped activation functions (step S706). If there remains an activation function that has not yet been acquired, the process returns to step S703, and the convolutional layer circuitization unit 501 performs the processes of steps S703 to S706 for the next activation function. In other words, the convolutional layer circuitization unit 501 performs the processing of steps S703 to S706 on all of the grouped activation functions.

The circuit scale calculation unit 503 calculates the size when circuitizing each group of convolutional layer information grouped by the convolutional layer circuitization unit 501 and activation function information grouped by the activation layer circuitization unit 502. Calculate the scale of The scale of the circuit can be calculated, for example, based on the number of arithmetic units such as adders and multipliers included in the circuit.

FIG. 8 is a flowchart showing the processing of the circuit scale calculation unit 503. The operation of the circuit scale calculation unit 503 will be described below with reference to FIG.

The circuit scale calculation unit 503 first obtains one group of convolutional layers included in the grouped convolutional layer information (step S801). Then, the circuit scale calculation unit 503 determines a computing unit necessary for circuitizing the computation of the convolutional layer based on the parameters stored in association with the acquired convolutional layer (step S802). Then, the circuit scale calculation unit 503 calculates the circuit scale when the computation of the convolution layer is circuitized from the determined type and number of arithmetic units (step S803).

It should be noted that the arithmetic unit selected in circuitizing the arithmetic operation does not have to be the smallest arithmetic unit that can execute the arithmetic operation, and an arithmetic unit with a large circuit scale may be selected. This is because it is efficient in terms of data access if the circuit size is a power of 2, so the circuit size of the arithmetic unit is intentionally set so that the circuit size of the arithmetic unit is a power of 2. By increasing .

After that, the circuit scale calculation unit 503 checks whether there remains a group of convolutional layers that has not been acquired yet in the grouped convolutional layer information (step S804). If there remains a convolutional layer group that has not been acquired yet, the process returns to step S801, and the circuit scale calculation unit 503 performs the processes of steps S801 to S803 on the next group. In other words, the circuit scale calculation unit 503 performs the processing in steps S801 to S803 for all groups of convolutional layers.

If all convolution layer groups have been acquired, the circuit scale calculation unit 503 acquires one activation function group corresponding to the activation layer included in the grouped activation function information (step S805). ). Next, the circuit scale calculation unit 503 determines the arithmetic units necessary to circuitize the activation function (step S806). Then, the circuit scale calculation unit 503 calculates the circuit scale when the activation function is circuitized from the type and number of arithmetic units (step S807).

Subsequently, the circuit scale calculation unit 503 determines whether the activation function can be linearly approximated (step S808). If the activation function can be linearly approximated, the approximation function is specified (step S809), the arithmetic units necessary for circuitizing the approximation function are determined (step S810), and the circuit is constructed based on the type and number of the determined arithmetic units. The scale is calculated (step S811), and the calculated circuit scale is stored in association with the activation function.

After that, the circuit scale calculation unit 503 checks whether there are any groups of activation functions that have not been acquired yet in the grouped activation function information (step S812). If there remains a group of activation functions that have not yet been obtained, the process returns to step S805, and the circuit scale calculation unit 503 performs the processes of steps S805 to S812 for the next group. In other words, the circuit scale calculation unit 503 performs the processing in steps S805 to S812 for all groups of activation functions.

The circuitization location determination unit 504 receives a list of circuits required to process the neural network and information on the circuit scale from the circuit scale calculation unit 503, and takes into account the capacity of the FPGA to be mounted, and configures the neural network. Determine which network processes should be circuitized. At this time, the processing determined not to be circuitized will be processed by software on a processor such as a CPU. The criteria for selecting processes to be circuitized (in other words, the criteria for deciding whether to circuitize each process) are whether the execution time of the neural network processing will be small (shortened), and whether the accuracy of the neural network will be high. Two factors are taken into consideration:

Normally, the execution time of neural network processing is shorter when each process is implemented as a circuit than when processed using software. Therefore, when selecting processing to be circuitized based on the execution time of neural network processing as a selection criterion, among the list of circuits required to process the neural network received from the circuit size calculation unit 503, when processing is performed by software, It is determined that the larger (longer) the processing time, the more preferentially it should be converted into a circuit. At this time, the number of times each circuit is used during neural network processing is also taken into consideration. For example, even if the execution time when processing A is processed by software is 5 milliseconds, and the execution time is 30 milliseconds when processing B is processed by software, the number of times processing A is used in the neural network is 20 times. If B is used twice, the processing time of processing A (5 ms x 20) is longer than the processing time of processing B (30 ms x 2) when looking at the entire neural network, so processing It is determined that process A should be circuitized with priority over process B.

On the other hand, the accuracy of neural networks is basically the same whether each process is implemented as a circuit or processed using software. However, if the activation function is processed using a linear approximation function, the accuracy may decrease. Therefore, when deciding whether to circuitize activation function processing, it is necessary to consider both the degree of reduction in execution time and the degree of decrease in accuracy due to circuitization. Regarding the decrease in accuracy due to linear approximation, the larger the error (difference between linear approximation and non-linear approximation) in the range of function inputs that can be taken within the neural network, which is caused by linear approximation of the activation function, , it can be said that the degree of decrease in accuracy is large. Furthermore, it can be said that the more times an activation function is used within a neural network, the greater the decrease in accuracy when linearly approximating the activation function. Furthermore, the location where the activation function is used within the neural network also affects the degree of accuracy reduction when linearly approximating the activation function. For example, the activation function near the input layer, which is the first layer of a neural network, directly affects the input data to the neural network, so the decrease in accuracy caused by linear approximation is large, and Since the activation function is used in processing after feature extraction, it is thought that the decrease in accuracy caused by linear approximation is small. However, it is not necessarily necessary to evaluate that the closer the activation function is to the input layer, the greater the decrease in accuracy due to linear approximation. The circuitization location determination unit 504 evaluates the degree of decrease in accuracy of the neural network by integrating these. For example, the circuitization location determination unit 504 calculates the product of the error caused by linear approximation of the activation function and the number of times the activation function is used within the neural network, and calculates the accuracy decrease due to the linear approximation of the activation function. It may also be calculated as the degree of

Furthermore, since it is not possible to implement circuits that exceed the capacity of the FPGA, an upper limit is set on the processing that can be implemented as a circuit due to the constraint of the FPGA capacity. On the other hand, the execution time of neural network processing is reduced more and more as more processing is implemented in circuits. Furthermore, since the circuit scale when processing the activation function including nonlinear processing is larger than the circuit scale when processing the activation function including linear processing only, If the processing of the conversion function is implemented as a circuit, the number of processes that can be implemented as a circuit will be reduced due to the capacity constraints of the FPGA. This problem can be avoided by linearly approximating the activation function and creating a circuit, but the linear approximation causes a decrease in accuracy. In other words, there is a trade-off relationship between the execution time of neural network processing (ie, the number of processes to be circuitized) and the accuracy of the neural network. Therefore, the circuitization location determination unit 504 determines the circuitization location by considering the balance between contradictory factors such as the execution time of neural network processing and the accuracy of the neural network. For example, a method can be considered in which a weighted linear sum of the execution time of neural network processing and the accuracy of the neural network is used as an evaluation function, and a portion to be circuitized is determined so that the value of the evaluation function is minimized. At this time, constraints such as the upper limit of the execution time of neural network processing, the lower limit of neural network accuracy, the capacity of shared memory required between the CPU and FPGA, and the upper limit of the bandwidth of the data transfer bus are added. It's okay. Furthermore, since extreme increases in execution time and decreases in accuracy are undesirable, a nonlinear function that imposes a penalty on increases in execution time and decreases in accuracy may be used as the evaluation function.

As described above, according to the neural network device 100 according to the first embodiment, each process of the neural network is performed in consideration of the degree of shortening of the execution time of the neural network process and the degree of decrease in accuracy due to the circuitization of the process. Neural network processing is designed by determining whether to circuitize it on an FPGA or process it in software using a processor such as a CPU. This makes it possible to design a neural network in which a processor and a circuit on an FPGA perform processing in cooperation with each other, and to realize a neural network that can perform high-speed processing on hardware with small resources.

9 and 10 are diagrams each showing an example of the hardware configuration of the neural network construction unit 101. Each function of the constituent elements of the neural network construction unit 101 shown in FIG. 1 is realized, for example, by the processing circuit 10 shown in FIG. 9. That is, the neural network construction unit 101 determines the calculation method for each calculation constituting the neural network, whether to convert the calculation into a circuit or to process it by software, and constructs a circuit for converting the calculation determined to be circuitized into a circuit. A processing circuit 10 is provided for creating and outputting a program for software processing information and calculations determined to be processed by software. The processing circuit 10 may be dedicated hardware, or may include a processor (Central Processing Unit (CPU), processing device, arithmetic device, microprocessor, microcomputer, etc.) that executes a program stored in memory. It may be configured using a DSP (also called Digital Signal Processor).

When the processing circuit 10 is dedicated hardware, the processing circuit 10 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Circuit). Gate Array), or a combination of these. The functions of each component of the neural network construction unit 101 may be realized by individual processing circuits, or these functions may be realized collectively by one processing circuit.

FIG. 10 shows an example of the hardware configuration of the neural network construction unit 101 in a case where the processing circuit 10 is configured using a processor 11 that executes a program. In this case, the functions of the constituent elements of the neural network construction unit 101 are realized by software or the like (software, firmware, or a combination of software and firmware). Software and the like are written as programs and stored in the memory 12. The processor 11 implements the functions of each section by reading and executing programs stored in the memory 12. That is, when executed by the processor 11, the neural network construction unit 101 determines the calculation method for each calculation that constitutes the neural network, whether to convert the calculation into a circuit or to process it by software, and determines whether to convert the calculation into a circuit. A process for creating and outputting circuit information for converting the determined operation into a circuit and a program for software processing the determined operation, and storing the program that will be executed as a result. A memory 12 is provided for the purpose. In other words, this program can be said to cause a computer to execute the procedures and methods for the operations of the components of the neural network construction unit 101.

Here, the memory 12 is, for example, a non-volatile or Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc) and their drive devices, as well as all storage media that will be used in the future. Good too.

The above describes the configuration in which the functions of the components of the neural network construction unit 101 are realized by either hardware, software, or the like. However, the present invention is not limited to this, and a configuration may be adopted in which some of the components of the neural network construction unit 101 are implemented by dedicated hardware, and other components are implemented by software or the like. For example, for some components, the functions are realized by the processing circuit 10 as dedicated hardware, and for some other components, the processing circuit 10 as the processor 11 executes a program stored in the memory 12. The function can be realized by reading and executing it.

As described above, the neural network construction unit 101 can realize each of the above functions using hardware, software, etc., or a combination thereof.

<Embodiment 2>
FIG. 11 is a block diagram showing the configuration of neural network device 100 according to the second embodiment. In FIG. 11, the same or equivalent elements as those shown in Embodiment 1 (FIG. 1) are denoted by the same reference numerals, and their description will be omitted here.

As shown in FIG. 11, the neural network device 100 according to the second embodiment includes a neural network execution unit 901 in addition to the neural network construction unit 101. The neural network execution unit 901 includes a storage unit 905, a CPU 902, an FPGA 903, a memory 904, and a data acquisition circuit 906.

The neural network execution unit 901 executes neural network arithmetic processing in which the CPU 902 and the FPGA 903 work together based on the program and circuit information created by the neural network construction unit 101.

The storage unit 905 stores the program and circuit information created by the neural network construction unit 101. The CPU 902 reads a program stored in the storage unit 905, and performs arithmetic processing of the neural network assigned to the CPU 902 and controls the FPGA 903 based on the program. The FPGA 903 reads circuit information stored in the storage unit 905, configures an arithmetic circuit based on the circuit information, and performs arithmetic processing of a neural network assigned to the FPGA 903.

The memory 904 is for relaying data exchanged between the CPU 902 and the FPGA 903. More specifically, the CPU 902 stores input data for calculations using a circuit built on the FPGA 903 in the memory 904, and the FPGA 903 reads this input data and uses it for calculations on the circuit. Further, the FPGA 903 stores the calculation result in the memory 904, and the CPU 902 reads the calculation result from the memory 904 and uses it for software processing.

The data acquisition circuit 906 is a circuit used when the FPGA 903 reads data from the memory 904. In this embodiment, data acquisition circuit 906 is constructed as one of the calculation circuits on FPGA 903.

Generally, when an FPGA reads data from an external memory and performs an operation, it starts acquiring the data upon receiving notification that the necessary data has been stored in a predetermined location in the memory. The data acquisition circuit 906 is provided to omit this notification process. Specifically, the data acquisition circuit 906 predetermines the size of input data for each circuit on the FPGA 903, and automatically transfers the data to the FPGA 903 when data of that size is available on the memory 904. . The size of the input data is determined when the neural network construction unit 101 determines the circuit configuration, and therefore can be calculated by the data acquisition circuit control data generation unit 303 of the neural network construction unit 101. The neural network construction unit 101 includes the size of the input data of each circuit calculated by the data acquisition circuit control data generation unit 303 in the circuit information, and stores the circuit information in the storage unit 905.

Note that although the CPU 902, FPGA 903, and memory 904 are shown as separate blocks in FIG. good.

Furthermore, it is possible to freely combine each embodiment, or to modify or omit each embodiment as appropriate.

The above description is understood to be illustrative in all aspects, and countless variations not exemplified can be envisioned.

100 neural network device, 101 neural network construction unit, 102 neural network analysis unit, 103 neural network calculation method output unit, 104 storage unit, 201 network structure analysis unit, 202 neural network division unit, 301 control program creation unit, 302 arithmetic circuit Creation unit, 303 Control data generation unit for data acquisition circuit, 401 Arithmetic structure classification unit, 402 Convolution layer analysis unit, 403 Activation layer analysis unit, 501 Convolution layer circuitization unit, 502 Activation layer circuitization unit, 503 Circuit scale Calculation unit, 504 circuitization location determination unit, 901 neural network execution unit, 902 CPU, 903 FPGA, 904 memory, 905 storage unit, 906 data acquisition circuit, 10 processing circuit, 11 processor, 12 memory.

Claims

a neural network analysis unit that determines the calculation method for each calculation constituting the neural network, such as whether to implement the calculation in a circuit or process it in software;
a neural network calculation method output unit that creates and outputs circuit information for circuitizing the calculations determined to be circuitized and a program for software processing the calculations determined to be processed by software;
A neural network device comprising:
The neural network analysis unit includes:
a network structure analysis unit that analyzes the calculation structure of the neural network;
a neural network dividing unit that determines whether to perform circuit processing or software processing for each of the calculations obtained by dividing the neural network;
The network structure analysis unit includes:
a computation structure classification unit that classifies each layer of the neural network according to the type of computation constituting the layer;
a convolution layer analysis unit that specifies parameters of the convolution operation for a layer classified as a layer for performing a convolution operation by the operation structure classification unit;
The neural network dividing unit is
a convolutional layer circuitization unit that groups layers having the same or similar parameters based on the parameters specified by the convolutional layer analysis unit;
a circuit scale calculation unit that calculates a circuit scale when the convolution operation is circuitized for each of the convolution operations of the layers grouped by the convolution layer circuitization unit;
a circuitization location determining unit that determines an operation to be circuitized based on the circuit scale calculated by the circuit scale calculation unit;
Neural network device.
The neural network calculation method output section includes:
an arithmetic circuit creation unit that creates the circuit information;
a control program creation unit that creates the program;
The program created by the control program creation unit includes a control program for managing input and output of an arithmetic circuit constructed based on the circuit information.
The neural network device according to claim 1.
The network structure analysis unit includes:
further comprising an activation layer analysis unit that identifies an activation function used in each layer for the layers classified by the arithmetic structure classification unit as layers that perform processing based on activation functions;
The neural network dividing unit is
Among the activation functions used in each layer classified by the calculation structure classification unit into layers that perform processing based on activation functions, the same activation functions are grouped, and among the grouped activation functions, linear approximation is possible. further comprising an activation layer circuitization unit that stores a linear approximation function obtained by linearly approximating the object in association with the activation function,
The circuit size calculation unit further calculates a circuit size when each of the grouped activation functions and the linear approximation functions is circuitized.
The neural network device according to claim 1 or claim 2.
The neural network device includes:
further comprising a neural network execution unit that executes neural network processing based on the circuit information and the program output by the neural network calculation method output unit,
The neural network execution unit includes:
a storage unit that stores the circuit information and the program;
a CPU that executes the program;
an FPGA that constructs an arithmetic circuit based on the circuit information and executes an operation by the arithmetic circuit;
a memory for relaying data between the CPU and the FPGA;
Equipped with
A neural network device according to any one of claims 1 to 3.
The neural network execution unit includes:
further comprising a data acquisition circuit that automatically acquires data passed from the CPU to the FPGA via the memory from the memory and reads it into the FPGA;
The neural network calculation method output section includes:
further comprising a data acquisition circuit control data generation unit that creates data for controlling the data acquisition circuit based on the circuit information;
The neural network device according to claim 4.
6. The neural network device according to claim 5, wherein the data for controlling the data acquisition circuit is data indicating the size of input data of the arithmetic circuit.