WO2023170855A1 - Neural network device - Google Patents

Neural network device Download PDF

Info

Publication number
WO2023170855A1
WO2023170855A1 PCT/JP2022/010523 JP2022010523W WO2023170855A1 WO 2023170855 A1 WO2023170855 A1 WO 2023170855A1 JP 2022010523 W JP2022010523 W JP 2022010523W WO 2023170855 A1 WO2023170855 A1 WO 2023170855A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
unit
circuit
layer
calculation
Prior art date
Application number
PCT/JP2022/010523
Other languages
French (fr)
Japanese (ja)
Inventor
督 那須
知嘉子 中西
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2022543749A priority Critical patent/JP7179237B1/en
Priority to PCT/JP2022/010523 priority patent/WO2023170855A1/en
Publication of WO2023170855A1 publication Critical patent/WO2023170855A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present disclosure relates to artificial intelligence technology, and particularly relates to an apparatus and method for creating a program that processes a neural network.
  • neural networks are capable of processing with extremely high precision and have been widely used recently. Additionally, neural networks involve a large number of calculations and are known to have a high processing load. In order to complete processing within the desired time, neural networks are realized using dedicated processors such as GPGPU (General Purpose Graphics Processing Unit) and hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Often.
  • GPGPU General Purpose Graphics Processing Unit
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Patent Document 1 listed below proposes a technique that utilizes the characteristics of this neural network to solve the problem of a large amount of calculation.
  • the technology disclosed in Patent Document 1 compiles different network structures into operations of the same arithmetic unit by controlling the same arithmetic unit based on a single instruction corresponding to each layer operation of the multi-layer arithmetic operation of a neural network.
  • the device is capable of realizing logic operations in all layers (for example, Patent Document 1 listed below).
  • Patent Document 1 uses hardware resources efficiently by communicating and using a single instruction prepared in advance on hardware according to the network structure, and achieves high-speed neural network processing. I can do it. However, since the processing is performed using a combination of single instructions prepared in advance on the hardware, there is a concern that the processing may not be completed within the desired time.
  • the present disclosure has been made to solve the above problems, and aims to enable the design of neural network processing that can achieve desired performance (execution time and accuracy) on hardware with fewer resources. purpose.
  • a neural network device includes a neural network analysis unit that determines a calculation method for each calculation that constitutes a neural network, such as whether to convert the calculation into a circuit or process it by software, and a neural network analysis unit that determines a calculation method for each calculation that constitutes the neural network, and a circuit for processing the calculation that has been determined to be circuitized.
  • a neural network calculation method output unit that creates and outputs a program for software processing of circuit information and software processing of calculations determined to be processed by software, the neural network analysis unit comprising: , a network structure analysis unit that analyzes the calculation structure of the neural network, and a neural network division unit that determines whether each calculation obtained by dividing the neural network should be implemented as a circuit or processed by software.
  • the network structure analysis unit classifies each layer of the neural network into a calculation structure classification unit that classifies each layer according to the type of calculation that constitutes the layer, and a layer that performs a convolution calculation by the calculation structure classification unit.
  • a convolution layer analysis unit that specifies parameters of the convolution operation for the layer that has been determined, and the neural network division unit has the same or similar parameters based on the parameters specified by the convolution layer analysis unit.
  • a convolutional layer circuitization unit that groups layers; a circuit scale calculation unit that calculates a circuit scale when the convolutional operation is circuitized for each convolution operation of the layer grouped by the convolutional layer circuitization unit; and a circuitization location determining section that determines an operation to be circuitized based on the circuit scale calculated by the circuit scale calculation section.
  • FIG. 1 is a block diagram showing the configuration of a neural network device according to Embodiment 1.
  • FIG. FIG. 2 is a block diagram showing the configuration of a neural network analysis section according to the first embodiment.
  • FIG. 2 is a block diagram showing the configuration of a neural network calculation method output unit according to the first embodiment.
  • FIG. 2 is a block diagram showing the configuration of a network structure analysis unit according to the first embodiment.
  • FIG. 2 is a block diagram showing the configuration of a neural network dividing section according to the first embodiment.
  • 7 is a flowchart illustrating processing of the convolutional layer circuitization unit according to the first embodiment. 7 is a flowchart showing processing of the activation layer circuitization unit according to the first embodiment.
  • FIG. 7 is a flowchart illustrating processing of a circuit scale calculation unit according to the first embodiment.
  • FIG. 3 is a diagram showing an example of the hardware configuration of a neural network construction section.
  • FIG. 3 is a diagram showing an example of the hardware configuration of a neural network construction section.
  • FIG. 2 is a block diagram showing the configuration of a neural network device according to a second embodiment.
  • FIG. 1 is a block diagram showing the configuration of a neural network device 100 according to the first embodiment.
  • the neural network device 100 includes a neural network construction unit 101 having a neural network analysis unit 102, a neural network calculation method output unit 103, and a storage unit 104.
  • the neural network analysis unit 102 reads the network structure data of the neural network stored in the storage unit 104, analyzes the network structure, determines a calculation method including a program and circuit for operating the neural network, and executes the determined calculation. Output the method. That is, the neural network analysis unit 102 determines the calculation method for each calculation constituting the neural network, such as circuitization or software processing.
  • the neural network calculation method output unit 103 Based on the calculation method received from the neural network analysis unit 102, the neural network calculation method output unit 103 outputs program data that runs on a processor such as a CPU (Central Processing Unit) and data for constructing a calculation circuit on an FPGA. Create and output circuit information. That is, the neural network calculation method output unit 103 outputs circuit information for converting into a circuit the calculations determined by the neural network analysis unit 102 to be processed into a circuit, and software processes the calculations determined to be processed by the neural network analysis unit 102 using software. Create a program to do this and output it.
  • a processor such as a CPU (Central Processing Unit)
  • FIG. 2 is a block diagram showing the configuration of the neural network analysis section 102.
  • the neural network analysis section 102 includes a network structure analysis section 201 and a neural network division section 202.
  • the network structure analysis unit 201 analyzes the calculation structure within the network based on the network structure data of the neural network read from the storage unit 104, and outputs the analysis result.
  • the neural network dividing unit 202 receives the analysis result of the calculation structure in the network input from the network structure analysis unit 201, divides the calculations making up the network structure into calculation units of a predetermined size, and divides the calculations into calculation units of a predetermined size. It is determined whether the processing of each calculation is to be performed by the CPU or by the FPGA (that is, whether to perform software processing or circuitization), and the determination result is output in association with each calculation after division.
  • FIG. 3 is a block diagram showing the configuration of the neural network calculation method output section 103.
  • the neural network calculation method output unit 103 includes a control program creation unit 301, an arithmetic circuit creation unit 302, and a data acquisition circuit control data generation unit 303.
  • the control program creation unit 301 receives the calculation method for each process of the neural network determined by the neural network analysis unit 102, and creates and outputs a program for the CPU for calculations to be processed by software in the CPU.
  • the program also includes a control program that manages the input/output of an arithmetic circuit running on the FPGA and enables processing of the entire neural network by controlling the FPGA.
  • the control program performs processing such as inputting data to arithmetic circuit A, receiving a calculation result from arithmetic circuit A, and inputting it to arithmetic circuit B, for example.
  • a part of the calculation process may be included in the control program.
  • control program performs processing such as calculating the product of the outputs of arithmetic circuit A and arithmetic circuit B, for example.
  • the program output by the control program creation unit 301 is, for example, binary data of an executable program obtained by compiling a code written in C language or the like using a compiler for a specific CPU.
  • the arithmetic circuit creation unit 302 receives the arithmetic method including the program and circuit for operating the neural network determined by the neural network analysis unit 102, and creates and outputs circuit information for constructing an arithmetic circuit that operates on the FPGA. .
  • the data acquisition circuit control data generation unit 303 calculates parameters to be provided to a dedicated circuit that receives data from a shared memory or the like, which is used when the FPGA receives data from the CPU, based on circuit information. For example, an arithmetic circuit A on an FPGA receives data of a certain size (data width) and calculates the average of a certain number of data contained therein, and the data of a certain size is stored in the shared memory. If the arithmetic circuit A performs an operation of starting an operation without receiving an operation execution command from the CPU, that is, an operation of autonomously acquiring data, the above parameters enable the operation. This is to pre-incorporate a "fixed size" value for the calculation into the arithmetic circuit on the FPGA.
  • FIG. 4 is a block diagram showing the components of the network structure analysis unit 201.
  • the network structure analysis section 201 includes an arithmetic structure classification section 401, a convolution layer analysis section 402, and an activation layer analysis section 403.
  • the calculation structure classification unit 401 analyzes what kind of calculation each layer in the neural network is composed of, and uses the analysis result as calculation information for each layer. It is associated with each layer and output as .
  • the operation associated with each layer is, for example, a convolution operation or an operation using an activation function.
  • the operations associated with each layer may include operations other than the convolution operation and the activation function. In that case, other calculations may be implemented in a circuit or processed by a CPU.
  • the operations constituting the fully connected layer may be circuitized as they are, or may be treated as convolution operations with large parameters and circuitized as a convolution layer.
  • the convolutional layer analysis unit 402 receives the calculation information of each layer in the neural network from the calculation structure classification unit 401, and analyzes the layer that performs the convolution calculation among those layers. Specifically, the parameters of the convolution operation performed in each layer that performs the convolution operation are specified.
  • the parameters include, for example, input size, output size, number of kernels, kernel size, BorderMode, and the like.
  • the activation layer analysis unit 403 receives the calculation information of each layer in the neural network from the calculation structure classification unit 401, and analyzes the layer that performs processing based on the activation function among those layers. Specifically, what activation function is used in each layer that performs processing based on the activation function is specified.
  • FIG. 5 is a diagram showing the configuration of the neural network dividing section 202.
  • the neural network division section 202 includes a convolution layer circuitization section 501, an activation layer circuitization section 502, a circuit scale calculation section 503, and a circuitization location determination section 504.
  • the convolution layer circuitization unit 501 receives the analysis result by the network structure analysis unit 201, and outputs information related to circuitization of the convolution calculation process.
  • the activation layer circuitization unit 502 receives the analysis results from the network structure analysis unit 201 and outputs information related to circuitization of the activation function.
  • the circuit scale calculation unit 503 receives information related to circuitization output from the convolutional layer circuitization unit 501 and the activation layer circuitization unit 502, and calculates the circuit scale when each operation is circuitized.
  • the circuitization location determination unit 504 receives the information on the circuit scale of each operation calculated by the circuit size calculation unit 503, and separates the operation to be circuitized from the operation to be processed by a processor such as a CPU without circuitization.
  • FIG. 6 is a flowchart showing the processing of the convolutional layer circuitization unit 501. The operation of the convolutional layer circuitization section 501 will be described below with reference to FIG.
  • the convolutional layer circuitization unit 501 first obtains information on one convolutional layer from among the calculation information for each layer in the neural network received from the network structure analysis unit 201 (step S601). The convolutional layer circuitization unit 501 then checks the parameters included in the acquired information, associates the parameter information with the convolutional layer, and stores it as convolutional layer information (step S602).
  • the convolutional layer circuitization unit 501 checks whether there are any convolutional layers whose information has not yet been acquired in the calculation information of each layer in the neural network (step S603). If there remains a convolutional layer for which information has not been acquired, the process returns to step S601, and the convolutional layer circuitization unit 501 performs the processes of steps S601 and S602 for the next convolutional layer.
  • the convolutional layer circuitization unit 501 extracts layers with the same parameters from among the convolutional layers corresponding to the saved convolutional layer information, and extracts the layers with the same parameters.
  • the layers that have the same name are grouped (step S604).
  • the convolution layer circuitization unit 501 groups the convolution layers according to the parameters they have.
  • the grouping unit is a layer unit, but a product-sum operation that constitutes a convolution operation within a layer may be used as an even smaller grouping unit.
  • the convolutional layer circuitization unit 501 checks whether the number of created groups is greater than or equal to a predetermined number (step S605). If the number of groups is greater than or equal to a predetermined number, the convolutional layer circuitization unit 501 further groups groups having similar parameters (step S606). "Similar" here may be defined as a relationship in which parameters are close to each other within a certain range, or may be defined as an inclusive relationship in which parameters in one group include parameters in the other group. . The reason why groups whose parameters have an inclusive relationship is grouped is that if the containing operation is made into a circuit, the included operation can be executed using part of the circuit of the containing operation. . Note that when saving the grouping results, information about how many times the calculations belonging to each group are used in the entire neural network is associated and saved.
  • FIG. 7 is a flowchart showing the processing of the activation layer circuitization section 502. The operation of the activation layer circuitization section 502 will be described below with reference to FIG.
  • the activation layer circuitization unit 502 first obtains the activation function of each layer included in the calculation information of each layer in the neural network received from the network structure analysis unit 201, and stores it as activation function information (step S701). Then, the activation layer circuitization unit 502 extracts the same activation functions from among the saved activation functions, and groups the same activation functions (step S702). In other words, the activation layer circuitization unit 502 groups layers that perform processing based on activation functions for each activation function. Note that when saving the grouping results, information about how many times the calculations belonging to each group are used in the entire neural network is associated and saved.
  • the activation layer circuitization unit 502 obtains one of the grouped activation functions (step S703), and checks whether the activation function can be linearly approximated (step S704). If the activation function can be linearly approximated, the linearly approximated function is stored in association with the activation function (step S705).
  • the activation layer circuitization unit 502 checks whether there are any activation functions that have not been obtained yet among the grouped activation functions (step S706). If there remains an activation function that has not yet been acquired, the process returns to step S703, and the convolutional layer circuitization unit 501 performs the processes of steps S703 to S706 for the next activation function. In other words, the convolutional layer circuitization unit 501 performs the processing of steps S703 to S706 on all of the grouped activation functions.
  • the circuit scale calculation unit 503 calculates the size when circuitizing each group of convolutional layer information grouped by the convolutional layer circuitization unit 501 and activation function information grouped by the activation layer circuitization unit 502. Calculate the scale of The scale of the circuit can be calculated, for example, based on the number of arithmetic units such as adders and multipliers included in the circuit.
  • FIG. 8 is a flowchart showing the processing of the circuit scale calculation unit 503. The operation of the circuit scale calculation unit 503 will be described below with reference to FIG.
  • the circuit scale calculation unit 503 first obtains one group of convolutional layers included in the grouped convolutional layer information (step S801). Then, the circuit scale calculation unit 503 determines a computing unit necessary for circuitizing the computation of the convolutional layer based on the parameters stored in association with the acquired convolutional layer (step S802). Then, the circuit scale calculation unit 503 calculates the circuit scale when the computation of the convolution layer is circuitized from the determined type and number of arithmetic units (step S803).
  • the arithmetic unit selected in circuitizing the arithmetic operation does not have to be the smallest arithmetic unit that can execute the arithmetic operation, and an arithmetic unit with a large circuit scale may be selected. This is because it is efficient in terms of data access if the circuit size is a power of 2, so the circuit size of the arithmetic unit is intentionally set so that the circuit size of the arithmetic unit is a power of 2. By increasing .
  • the circuit scale calculation unit 503 checks whether there remains a group of convolutional layers that has not been acquired yet in the grouped convolutional layer information (step S804). If there remains a convolutional layer group that has not been acquired yet, the process returns to step S801, and the circuit scale calculation unit 503 performs the processes of steps S801 to S803 on the next group. In other words, the circuit scale calculation unit 503 performs the processing in steps S801 to S803 for all groups of convolutional layers.
  • the circuit scale calculation unit 503 acquires one activation function group corresponding to the activation layer included in the grouped activation function information (step S805). ). Next, the circuit scale calculation unit 503 determines the arithmetic units necessary to circuitize the activation function (step S806). Then, the circuit scale calculation unit 503 calculates the circuit scale when the activation function is circuitized from the type and number of arithmetic units (step S807).
  • the circuit scale calculation unit 503 determines whether the activation function can be linearly approximated (step S808). If the activation function can be linearly approximated, the approximation function is specified (step S809), the arithmetic units necessary for circuitizing the approximation function are determined (step S810), and the circuit is constructed based on the type and number of the determined arithmetic units. The scale is calculated (step S811), and the calculated circuit scale is stored in association with the activation function.
  • the circuit scale calculation unit 503 checks whether there are any groups of activation functions that have not been acquired yet in the grouped activation function information (step S812). If there remains a group of activation functions that have not yet been obtained, the process returns to step S805, and the circuit scale calculation unit 503 performs the processes of steps S805 to S812 for the next group. In other words, the circuit scale calculation unit 503 performs the processing in steps S805 to S812 for all groups of activation functions.
  • the circuitization location determination unit 504 receives a list of circuits required to process the neural network and information on the circuit scale from the circuit scale calculation unit 503, and takes into account the capacity of the FPGA to be mounted, and configures the neural network. Determine which network processes should be circuitized. At this time, the processing determined not to be circuitized will be processed by software on a processor such as a CPU.
  • the criteria for selecting processes to be circuitized are whether the execution time of the neural network processing will be small (shortened), and whether the accuracy of the neural network will be high. Two factors are taken into consideration:
  • the execution time of neural network processing is shorter when each process is implemented as a circuit than when processed using software. Therefore, when selecting processing to be circuitized based on the execution time of neural network processing as a selection criterion, among the list of circuits required to process the neural network received from the circuit size calculation unit 503, when processing is performed by software, It is determined that the larger (longer) the processing time, the more preferentially it should be converted into a circuit. At this time, the number of times each circuit is used during neural network processing is also taken into consideration. For example, even if the execution time when processing A is processed by software is 5 milliseconds, and the execution time is 30 milliseconds when processing B is processed by software, the number of times processing A is used in the neural network is 20 times. If B is used twice, the processing time of processing A (5 ms x 20) is longer than the processing time of processing B (30 ms x 2) when looking at the entire neural network, so processing It is determined that process A should be circuitized with priority over process B.
  • the accuracy of neural networks is basically the same whether each process is implemented as a circuit or processed using software.
  • the accuracy may decrease. Therefore, when deciding whether to circuitize activation function processing, it is necessary to consider both the degree of reduction in execution time and the degree of decrease in accuracy due to circuitization.
  • the more times an activation function is used within a neural network the greater the decrease in accuracy when linearly approximating the activation function.
  • the location where the activation function is used within the neural network also affects the degree of accuracy reduction when linearly approximating the activation function.
  • the activation function near the input layer which is the first layer of a neural network, directly affects the input data to the neural network, so the decrease in accuracy caused by linear approximation is large, and Since the activation function is used in processing after feature extraction, it is thought that the decrease in accuracy caused by linear approximation is small.
  • the circuitization location determination unit 504 evaluates the degree of decrease in accuracy of the neural network by integrating these. For example, the circuitization location determination unit 504 calculates the product of the error caused by linear approximation of the activation function and the number of times the activation function is used within the neural network, and calculates the accuracy decrease due to the linear approximation of the activation function. It may also be calculated as the degree of
  • the circuitization location determination unit 504 determines the circuitization location by considering the balance between contradictory factors such as the execution time of neural network processing and the accuracy of the neural network. For example, a method can be considered in which a weighted linear sum of the execution time of neural network processing and the accuracy of the neural network is used as an evaluation function, and a portion to be circuitized is determined so that the value of the evaluation function is minimized.
  • constraints such as the upper limit of the execution time of neural network processing, the lower limit of neural network accuracy, the capacity of shared memory required between the CPU and FPGA, and the upper limit of the bandwidth of the data transfer bus are added. It's okay. Furthermore, since extreme increases in execution time and decreases in accuracy are undesirable, a nonlinear function that imposes a penalty on increases in execution time and decreases in accuracy may be used as the evaluation function.
  • each process of the neural network is performed in consideration of the degree of shortening of the execution time of the neural network process and the degree of decrease in accuracy due to the circuitization of the process.
  • Neural network processing is designed by determining whether to circuitize it on an FPGA or process it in software using a processor such as a CPU. This makes it possible to design a neural network in which a processor and a circuit on an FPGA perform processing in cooperation with each other, and to realize a neural network that can perform high-speed processing on hardware with small resources.
  • FIG. 9 and 10 are diagrams each showing an example of the hardware configuration of the neural network construction unit 101.
  • Each function of the constituent elements of the neural network construction unit 101 shown in FIG. 1 is realized, for example, by the processing circuit 10 shown in FIG. 9. That is, the neural network construction unit 101 determines the calculation method for each calculation constituting the neural network, whether to convert the calculation into a circuit or to process it by software, and constructs a circuit for converting the calculation determined to be circuitized into a circuit.
  • a processing circuit 10 is provided for creating and outputting a program for software processing information and calculations determined to be processed by software.
  • the processing circuit 10 may be dedicated hardware, or may include a processor (Central Processing Unit (CPU), processing device, arithmetic device, microprocessor, microcomputer, etc.) that executes a program stored in memory. It may be configured using a DSP (also called Digital Signal Processor).
  • the processing circuit 10 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Circuit). Gate Array), or a combination of these.
  • the functions of each component of the neural network construction unit 101 may be realized by individual processing circuits, or these functions may be realized collectively by one processing circuit.
  • FIG. 10 shows an example of the hardware configuration of the neural network construction unit 101 in a case where the processing circuit 10 is configured using a processor 11 that executes a program.
  • the functions of the constituent elements of the neural network construction unit 101 are realized by software or the like (software, firmware, or a combination of software and firmware).
  • Software and the like are written as programs and stored in the memory 12.
  • the processor 11 implements the functions of each section by reading and executing programs stored in the memory 12. That is, when executed by the processor 11, the neural network construction unit 101 determines the calculation method for each calculation that constitutes the neural network, whether to convert the calculation into a circuit or to process it by software, and determines whether to convert the calculation into a circuit.
  • a memory 12 is provided for the purpose.
  • this program can be said to cause a computer to execute the procedures and methods for the operations of the components of the neural network construction unit 101.
  • the memory 12 is, for example, a non-volatile or Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc) and their drive devices, as well as all storage media that will be used in the future. Good too.
  • HDD Hard Disk Drive
  • magnetic disk flexible disk
  • optical disk compact disk
  • mini disk mini disk
  • DVD Digital Versatile Disc
  • the present invention is not limited to this, and a configuration may be adopted in which some of the components of the neural network construction unit 101 are implemented by dedicated hardware, and other components are implemented by software or the like.
  • the functions are realized by the processing circuit 10 as dedicated hardware, and for some other components, the processing circuit 10 as the processor 11 executes a program stored in the memory 12. The function can be realized by reading and executing it.
  • the neural network construction unit 101 can realize each of the above functions using hardware, software, etc., or a combination thereof.
  • FIG. 11 is a block diagram showing the configuration of neural network device 100 according to the second embodiment.
  • the same or equivalent elements as those shown in Embodiment 1 (FIG. 1) are denoted by the same reference numerals, and their description will be omitted here.
  • the neural network device 100 includes a neural network execution unit 901 in addition to the neural network construction unit 101.
  • the neural network execution unit 901 includes a storage unit 905, a CPU 902, an FPGA 903, a memory 904, and a data acquisition circuit 906.
  • the neural network execution unit 901 executes neural network arithmetic processing in which the CPU 902 and the FPGA 903 work together based on the program and circuit information created by the neural network construction unit 101.
  • the storage unit 905 stores the program and circuit information created by the neural network construction unit 101.
  • the CPU 902 reads a program stored in the storage unit 905, and performs arithmetic processing of the neural network assigned to the CPU 902 and controls the FPGA 903 based on the program.
  • the FPGA 903 reads circuit information stored in the storage unit 905, configures an arithmetic circuit based on the circuit information, and performs arithmetic processing of a neural network assigned to the FPGA 903.
  • the memory 904 is for relaying data exchanged between the CPU 902 and the FPGA 903. More specifically, the CPU 902 stores input data for calculations using a circuit built on the FPGA 903 in the memory 904, and the FPGA 903 reads this input data and uses it for calculations on the circuit. Further, the FPGA 903 stores the calculation result in the memory 904, and the CPU 902 reads the calculation result from the memory 904 and uses it for software processing.
  • the data acquisition circuit 906 is a circuit used when the FPGA 903 reads data from the memory 904. In this embodiment, data acquisition circuit 906 is constructed as one of the calculation circuits on FPGA 903.
  • an FPGA when an FPGA reads data from an external memory and performs an operation, it starts acquiring the data upon receiving notification that the necessary data has been stored in a predetermined location in the memory.
  • the data acquisition circuit 906 is provided to omit this notification process. Specifically, the data acquisition circuit 906 predetermines the size of input data for each circuit on the FPGA 903, and automatically transfers the data to the FPGA 903 when data of that size is available on the memory 904. .
  • the size of the input data is determined when the neural network construction unit 101 determines the circuit configuration, and therefore can be calculated by the data acquisition circuit control data generation unit 303 of the neural network construction unit 101.
  • the neural network construction unit 101 includes the size of the input data of each circuit calculated by the data acquisition circuit control data generation unit 303 in the circuit information, and stores the circuit information in the storage unit 905.
  • CPU 902, FPGA 903, and memory 904 are shown as separate blocks in FIG. good.
  • 100 neural network device 101 neural network construction unit, 102 neural network analysis unit, 103 neural network calculation method output unit, 104 storage unit, 201 network structure analysis unit, 202 neural network division unit, 301 control program creation unit, 302 arithmetic circuit Creation unit, 303 Control data generation unit for data acquisition circuit, 401 Arithmetic structure classification unit, 402 Convolution layer analysis unit, 403 Activation layer analysis unit, 501 Convolution layer circuitization unit, 502 Activation layer circuitization unit, 503 Circuit scale Calculation unit, 504 circuitization location determination unit, 901 neural network execution unit, 902 CPU, 903 FPGA, 904 memory, 905 storage unit, 906 data acquisition circuit, 10 processing circuit, 11 processor, 12 memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

A neural network analysis unit (102) of a neural network device (100) has a network structure analysis unit (201) that analyzes the operation structure of a neural network, and a neural network division unit (202) that, for each operation obtained by dividing the neural network, determines whether to make said operation a circuit or perform software processing of said operation. The neural network division unit (202) has a convolutional layer circuitization unit (501) that groups layers having identical or similar parameters among layers of the neural network where convolutional operations are performed, a circuit scale calculation unit (503) that, for each convolutional operation of the grouped layers, calculates the circuit scale if said convolutional operation were made into a circuit, and a circuitization site determination unit (504) that, on the basis of the circuit scales calculated by the circuit scale calculation unit (503), determines an operation to make into a circuit.

Description

ニューラルネットワーク装置neural network device
 本開示は、人工知能技術に関し、特に、ニューラルネットワークを処理するプログラムを作成する装置および方法に関する。 The present disclosure relates to artificial intelligence technology, and particularly relates to an apparatus and method for creating a program that processes a neural network.
 例えば画像処理等の分野において、ニューラルネットワークは非常に高い精度で処理を行うことが可能であり、昨今広く利用されている。また、ニューラルネットワークは多数の演算を内包しており、処理負荷が高いことでも知られている。所望の時間以内に処理を完了させるために、GPGPU(General Purpose Graphics Processing Unit)などの専用プロセッサやFPGA(Field Programmable Gate Array)やASIC(Application Specific Integrated Circuit)などのハードウェアでニューラルネットワークが実現されることも多い。 For example, in fields such as image processing, neural networks are capable of processing with extremely high precision and have been widely used recently. Additionally, neural networks involve a large number of calculations and are known to have a high processing load. In order to complete processing within the desired time, neural networks are realized using dedicated processors such as GPGPU (General Purpose Graphics Processing Unit) and hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Often.
 ニューラルネットワークは、多数の演算を内包する一方で、畳み込み、活性化関数、全結合といった演算の組み合わせで構成されており、その構造は比較的単純である。例えば下記の特許文献1には、このニューラルネットワークの特性を利用して、演算量が多いという課題を解決する技術が提案されている。特許文献1の技術では、ニューラルネットワークの多層演算の各層演算に対応する単一命令に基づいて同一演算装置を制御することで異なるネットワーク構造を同一演算装置の演算にコンパイルし、それによって、同一演算装置がすべての層のロジック演算を実現できるようにしている(例えば、下記の特許文献1)。 Although a neural network includes a large number of operations, its structure is relatively simple, consisting of a combination of operations such as convolution, activation functions, and full connections. For example, Patent Document 1 listed below proposes a technique that utilizes the characteristics of this neural network to solve the problem of a large amount of calculation. The technology disclosed in Patent Document 1 compiles different network structures into operations of the same arithmetic unit by controlling the same arithmetic unit based on a single instruction corresponding to each layer operation of the multi-layer arithmetic operation of a neural network. The device is capable of realizing logic operations in all layers (for example, Patent Document 1 listed below).
特開2019-139747号公報JP2019-139747A
 特許文献1の技術では、ハードウェア上に予め用意された単一命令をネットワーク構造に合わせて連通して使用することで効率的にハードウェアリソースを使用し、高速にニューラルネットワーク処理を実現することができる。しかし、ハードウェア上に予め用意された単一命令の組み合わせを用いて処理を行うため、所望の時間内に処理が完了しないことが懸念される。 The technology of Patent Document 1 uses hardware resources efficiently by communicating and using a single instruction prepared in advance on hardware according to the network structure, and achieves high-speed neural network processing. I can do it. However, since the processing is performed using a combination of single instructions prepared in advance on the hardware, there is a concern that the processing may not be completed within the desired time.
 本開示は以上のような課題を解決するためになされたものであり、より少ないリソースのハードウェア上で所望の性能(実行時間および精度)を達成できるニューラルネットワーク処理の設計を可能とすることを目的とする。 The present disclosure has been made to solve the above problems, and aims to enable the design of neural network processing that can achieve desired performance (execution time and accuracy) on hardware with fewer resources. purpose.
 本開示に係るニューラルネットワーク装置は、ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定するニューラルネットワーク解析部と、回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力するニューラルネットワーク演算方式出力部と、を備えるニューラルネットワーク装置であって、前記ニューラルネットワーク解析部は、前記ニューラルネットワークの演算構造を解析するネットワーク構造解析部と、前記ニューラルネットワークを分割して得られる前記各演算について、当該演算を回路化するかソフトウェア処理するかを決定するニューラルネットワーク分割部と、を有し、前記ネットワーク構造解析部は、前記ニューラルネットワークの各層を、当該層を構成する演算の種類に応じて分類する演算構造分類部と、前記演算構造分類部によって畳み込み演算を行う層に分類された層について、前記畳み込み演算のパラメータを特定する畳み込み層分析部と、を有し、前記ニューラルネットワーク分割部は、前記畳み込み層分析部が特定したパラメータに基づいて、同一または類似するパラメータを持つ層をグループ化する畳み込み層回路化部と、前記畳み込み層回路化部によりグループ化された層の畳み込み演算それぞれについて、当該畳み込み演算を回路化した場合の回路規模を算出する回路規模算出部と、前記回路規模算出部が算出した前記回路規模に基づいて、回路化する演算を決定する回路化箇所決定部と、を有する。 A neural network device according to the present disclosure includes a neural network analysis unit that determines a calculation method for each calculation that constitutes a neural network, such as whether to convert the calculation into a circuit or process it by software, and a neural network analysis unit that determines a calculation method for each calculation that constitutes the neural network, and a circuit for processing the calculation that has been determined to be circuitized. a neural network calculation method output unit that creates and outputs a program for software processing of circuit information and software processing of calculations determined to be processed by software, the neural network analysis unit comprising: , a network structure analysis unit that analyzes the calculation structure of the neural network, and a neural network division unit that determines whether each calculation obtained by dividing the neural network should be implemented as a circuit or processed by software. The network structure analysis unit classifies each layer of the neural network into a calculation structure classification unit that classifies each layer according to the type of calculation that constitutes the layer, and a layer that performs a convolution calculation by the calculation structure classification unit. a convolution layer analysis unit that specifies parameters of the convolution operation for the layer that has been determined, and the neural network division unit has the same or similar parameters based on the parameters specified by the convolution layer analysis unit. a convolutional layer circuitization unit that groups layers; a circuit scale calculation unit that calculates a circuit scale when the convolutional operation is circuitized for each convolution operation of the layer grouped by the convolutional layer circuitization unit; and a circuitization location determining section that determines an operation to be circuitized based on the circuit scale calculated by the circuit scale calculation section.
 本開示によれば、より少ないリソースのハードウェア上で所望の性能(時間内処理および精度)を達成するニューラルネットワーク処理の設計が可能となる。 According to the present disclosure, it is possible to design neural network processing that achieves desired performance (in-time processing and accuracy) on hardware with fewer resources.
 本開示の目的、特徴、態様、および利点は、以下の詳細な説明と添付図面とによって、より明白となる。 Objects, features, aspects, and advantages of the present disclosure will become more apparent from the following detailed description and accompanying drawings.
実施の形態1に係るニューラルネットワーク装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a neural network device according to Embodiment 1. FIG. 実施の形態1に係るニューラルネットワーク解析部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a neural network analysis section according to the first embodiment. 実施の形態1に係るニューラルネットワーク演算方式出力部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a neural network calculation method output unit according to the first embodiment. 実施の形態1に係るネットワーク構造解析部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a network structure analysis unit according to the first embodiment. 実施の形態1に係るニューラルネットワーク分割部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a neural network dividing section according to the first embodiment. 実施の形態1に係る畳み込み層回路化部の処理を示すフローチャートである。7 is a flowchart illustrating processing of the convolutional layer circuitization unit according to the first embodiment. 実施の形態1に係る活性化層回路化部の処理を示すフローチャートである。7 is a flowchart showing processing of the activation layer circuitization unit according to the first embodiment. 実施の形態1に係る回路規模算出部の処理を示すフローチャートである。7 is a flowchart illustrating processing of a circuit scale calculation unit according to the first embodiment. ニューラルネットワーク構築部のハードウェア構成例を示す図である。FIG. 3 is a diagram showing an example of the hardware configuration of a neural network construction section. ニューラルネットワーク構築部のハードウェア構成例を示す図である。FIG. 3 is a diagram showing an example of the hardware configuration of a neural network construction section. 実施の形態2に係るニューラルネットワーク装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a neural network device according to a second embodiment.
 <実施の形態1>
 図1は、実施の形態1に係るニューラルネットワーク装置100の構成を示すブロック図である。図1のように、ニューラルネットワーク装置100は、ニューラルネットワーク解析部102と、ニューラルネットワーク演算方式出力部103と、記憶部104とを有するニューラルネットワーク構築部101を備える。
<Embodiment 1>
FIG. 1 is a block diagram showing the configuration of a neural network device 100 according to the first embodiment. As shown in FIG. 1, the neural network device 100 includes a neural network construction unit 101 having a neural network analysis unit 102, a neural network calculation method output unit 103, and a storage unit 104.
 ニューラルネットワーク解析部102は、記憶部104に格納されたニューラルネットワークのネットワーク構造データを読み取り、そのネットワーク構造を解析して、ニューラルネットワークを動作させるプログラムおよび回路を含む演算方式を決定し、決定した演算方式を出力する。すなわち、ニューラルネットワーク解析部102は、ニューラルネットワークを構成する各演算について、回路化するかソフトウェア処理するかという演算方式を決定する。 The neural network analysis unit 102 reads the network structure data of the neural network stored in the storage unit 104, analyzes the network structure, determines a calculation method including a program and circuit for operating the neural network, and executes the determined calculation. Output the method. That is, the neural network analysis unit 102 determines the calculation method for each calculation constituting the neural network, such as circuitization or software processing.
 ニューラルネットワーク演算方式出力部103は、ニューラルネットワーク解析部102から受け取った演算方式を基に、CPU(Central Processing Unit)等のプロセッサで動作するプログラムのデータと、FPGA上に演算回路を構築するための回路情報とを作成して出力する。すなわち、ニューラルネットワーク演算方式出力部103は、ニューラルネットワーク解析部102により回路化すると決定された演算を回路化するための回路情報と、ニューラルネットワーク解析部102によりソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力する。 Based on the calculation method received from the neural network analysis unit 102, the neural network calculation method output unit 103 outputs program data that runs on a processor such as a CPU (Central Processing Unit) and data for constructing a calculation circuit on an FPGA. Create and output circuit information. That is, the neural network calculation method output unit 103 outputs circuit information for converting into a circuit the calculations determined by the neural network analysis unit 102 to be processed into a circuit, and software processes the calculations determined to be processed by the neural network analysis unit 102 using software. Create a program to do this and output it.
 図2は、ニューラルネットワーク解析部102の構成を示すブロック図である。図2のように、ニューラルネットワーク解析部102は、ネットワーク構造解析部201と、ニューラルネットワーク分割部202とを有する。 FIG. 2 is a block diagram showing the configuration of the neural network analysis section 102. As shown in FIG. 2, the neural network analysis section 102 includes a network structure analysis section 201 and a neural network division section 202.
 ネットワーク構造解析部201は、記憶部104から読み取ったニューラルネットワークのネットワーク構造データを基に、ネットワーク内の演算構造を解析し、その解析結果を出力する。 The network structure analysis unit 201 analyzes the calculation structure within the network based on the network structure data of the neural network read from the storage unit 104, and outputs the analysis result.
 ニューラルネットワーク分割部202は、ネットワーク構造解析部201から入力されるネットワーク内の演算構造の解析結果を受け、ネットワーク構造を構成する演算を、予め定められた規模の演算単位に分割し、分割後の各演算の処理をCPUで動作させるかFPGAで動作させるか(つまり、ソフトウェア処理するか回路化するか)を決定し、その決定結果を分割後の各演算と関連付けして出力する。 The neural network dividing unit 202 receives the analysis result of the calculation structure in the network input from the network structure analysis unit 201, divides the calculations making up the network structure into calculation units of a predetermined size, and divides the calculations into calculation units of a predetermined size. It is determined whether the processing of each calculation is to be performed by the CPU or by the FPGA (that is, whether to perform software processing or circuitization), and the determination result is output in association with each calculation after division.
 図3は、ニューラルネットワーク演算方式出力部103の構成を示すブロック図である。図3のように、ニューラルネットワーク演算方式出力部103は、制御プログラム作成部301と、演算回路作成部302と、データ取得回路用制御データ生成部303とを有する。 FIG. 3 is a block diagram showing the configuration of the neural network calculation method output section 103. As shown in FIG. 3, the neural network calculation method output unit 103 includes a control program creation unit 301, an arithmetic circuit creation unit 302, and a data acquisition circuit control data generation unit 303.
 制御プログラム作成部301は、ニューラルネットワーク解析部102が決定したニューラルネットワークの各処理の演算方式を受け、CPUでソフトウェア処理する演算について、CPU向けのプログラムを作成して出力する。当該プログラムには、FPGA上で動く演算回路の入出力を管理し、FPGAを制御することでニューラルネットワーク全体の処理を可能とする制御プログラムも含まれる。制御プログラムは、例えば、演算回路Aへデータを入力し、演算回路Aから演算結果を受け取り、それを演算回路Bへと入力する、といった処理を行う。また、演算処理の一部をFPGA上で行わずCPU上で行うために、演算処理の一部を制御プログラムに含ませてもよい。その場合、制御プログラムは、例えば、演算回路Aと演算回路Bの出力の積を計算する、といった処理を行う。制御プログラム作成部301が出力するプログラムは、例えばC言語などで記述されたコードを、コンパイラにより特定のCPU向けにコンパイルした実行プログラムのバイナリデータなどである。 The control program creation unit 301 receives the calculation method for each process of the neural network determined by the neural network analysis unit 102, and creates and outputs a program for the CPU for calculations to be processed by software in the CPU. The program also includes a control program that manages the input/output of an arithmetic circuit running on the FPGA and enables processing of the entire neural network by controlling the FPGA. The control program performs processing such as inputting data to arithmetic circuit A, receiving a calculation result from arithmetic circuit A, and inputting it to arithmetic circuit B, for example. Furthermore, in order to perform part of the calculation process on the CPU instead of on the FPGA, a part of the calculation process may be included in the control program. In that case, the control program performs processing such as calculating the product of the outputs of arithmetic circuit A and arithmetic circuit B, for example. The program output by the control program creation unit 301 is, for example, binary data of an executable program obtained by compiling a code written in C language or the like using a compiler for a specific CPU.
 演算回路作成部302は、ニューラルネットワーク解析部102が決定したニューラルネットワークを動作させるプログラムおよび回路を含む演算方式を受け、FPGA上で動作する演算回路を構築するための回路情報を作成して出力する。 The arithmetic circuit creation unit 302 receives the arithmetic method including the program and circuit for operating the neural network determined by the neural network analysis unit 102, and creates and outputs circuit information for constructing an arithmetic circuit that operates on the FPGA. .
 データ取得回路用制御データ生成部303は、FPGAがCPUからデータを受け取る際に使用する、共有メモリなどからデータを受けとる専用回路に提供するパラメータを、回路情報に基づいて算出する。例えば、FPGA上の演算回路Aが、一定のサイズ(データ幅)のデータを受け取り、それに含まれる一定数のデータの平均を演算するものであり、一定のサイズのデータが共有メモリに格納されると、演算回路AがCPUからの演算実行指令を受け取ることなしに演算を開始する動作、すなわち演算回路Aが自律的にデータを取得する動作を行う場合、上記のパラメータは、その動作を可能にするための「一定のサイズ」の値を、予めFPGA上の演算回路に組み込むためのものである。 The data acquisition circuit control data generation unit 303 calculates parameters to be provided to a dedicated circuit that receives data from a shared memory or the like, which is used when the FPGA receives data from the CPU, based on circuit information. For example, an arithmetic circuit A on an FPGA receives data of a certain size (data width) and calculates the average of a certain number of data contained therein, and the data of a certain size is stored in the shared memory. If the arithmetic circuit A performs an operation of starting an operation without receiving an operation execution command from the CPU, that is, an operation of autonomously acquiring data, the above parameters enable the operation. This is to pre-incorporate a "fixed size" value for the calculation into the arithmetic circuit on the FPGA.
 図4は、ネットワーク構造解析部201の構成素示すブロック図である。図4のように、ネットワーク構造解析部201は、演算構造分類部401と、畳み込み層分析部402と、活性化層分析部403とを有する。 FIG. 4 is a block diagram showing the components of the network structure analysis unit 201. As shown in FIG. 4, the network structure analysis section 201 includes an arithmetic structure classification section 401, a convolution layer analysis section 402, and an activation layer analysis section 403.
 演算構造分類部401は、記憶部104から読み取ったニューラルネットワークのネットワーク構造データを基に、ニューラルネットワーク内の各層がどのような演算で構成されているかを分析し、その分析結果を各層の演算情報として各層に関連付けて出力する。各層に関連付ける演算は、例えば、畳み込み演算であったり、活性化関数を用いた演算であったりする。なお、各層に関連付ける演算には、畳み込み演算および活性化関数以外の他の演算が含まれていてもよい。その場合、他の演算は、回路化されてもよいし、CPUで処理されてもよい。例えば、全結合層を構成する演算は、そのまま回路化されてもよいし、パラメータの大きな畳み込み演算とみなして畳み込み層として回路化してもよい。 Based on the network structure data of the neural network read from the storage unit 104, the calculation structure classification unit 401 analyzes what kind of calculation each layer in the neural network is composed of, and uses the analysis result as calculation information for each layer. It is associated with each layer and output as . The operation associated with each layer is, for example, a convolution operation or an operation using an activation function. Note that the operations associated with each layer may include operations other than the convolution operation and the activation function. In that case, other calculations may be implemented in a circuit or processed by a CPU. For example, the operations constituting the fully connected layer may be circuitized as they are, or may be treated as convolution operations with large parameters and circuitized as a convolution layer.
 畳み込み層分析部402は、演算構造分類部401からニューラルネットワーク内の各層の演算情報を受け取り、それらの層のうち畳み込み演算を行う層の分析を行う。具体的には、畳み込み演算を行う各層で行われる畳み込み演算のパラメータを特定する。当該パラメータは、例えば、入力サイズ、出力サイズ、カーネル数、カーネルサイズ、BorderModeなどである。 The convolutional layer analysis unit 402 receives the calculation information of each layer in the neural network from the calculation structure classification unit 401, and analyzes the layer that performs the convolution calculation among those layers. Specifically, the parameters of the convolution operation performed in each layer that performs the convolution operation are specified. The parameters include, for example, input size, output size, number of kernels, kernel size, BorderMode, and the like.
 活性化層分析部403は、演算構造分類部401からニューラルネットワーク内の各層の演算情報を受け取り、それらの層のうち活性化関数に基づく処理を行う層の分析を行う。具体的には、活性化関数に基づく処理を行う各層で用いられる活性化関数が何かを特定する。 The activation layer analysis unit 403 receives the calculation information of each layer in the neural network from the calculation structure classification unit 401, and analyzes the layer that performs processing based on the activation function among those layers. Specifically, what activation function is used in each layer that performs processing based on the activation function is specified.
 図5は、ニューラルネットワーク分割部202の構成を示す図である。図5のように、ニューラルネットワーク分割部202は、畳み込み層回路化部501、活性化層回路化部502、回路規模算出部503、回路化箇所決定部504を有する。 FIG. 5 is a diagram showing the configuration of the neural network dividing section 202. As shown in FIG. 5, the neural network division section 202 includes a convolution layer circuitization section 501, an activation layer circuitization section 502, a circuit scale calculation section 503, and a circuitization location determination section 504.
 畳み込み層回路化部501は、ネットワーク構造解析部201による解析結果を受け取り、畳み込み演算処理の回路化に係る情報を出力する。 The convolution layer circuitization unit 501 receives the analysis result by the network structure analysis unit 201, and outputs information related to circuitization of the convolution calculation process.
 活性化層回路化部502は、ネットワーク構造解析部201による解析結果を受け取り、活性化関数の回路化に係る情報を出力する。 The activation layer circuitization unit 502 receives the analysis results from the network structure analysis unit 201 and outputs information related to circuitization of the activation function.
 回路規模算出部503は、畳み込み層回路化部501および活性化層回路化部502が出力する回路化に係る情報を受け、各演算を回路化した場合の回路規模を算出する。 The circuit scale calculation unit 503 receives information related to circuitization output from the convolutional layer circuitization unit 501 and the activation layer circuitization unit 502, and calculates the circuit scale when each operation is circuitized.
 回路化箇所決定部504は、回路規模算出部503が算出した各演算の回路規模の情報を受け、回路化する演算と、回路化せずにCPU等のプロセッサにて処理する演算とを切り分ける。 The circuitization location determination unit 504 receives the information on the circuit scale of each operation calculated by the circuit size calculation unit 503, and separates the operation to be circuitized from the operation to be processed by a processor such as a CPU without circuitization.
 図6は、畳み込み層回路化部501の処理を示すフローチャートである。以下、図6を参照しつつ、畳み込み層回路化部501の動作を説明する。 FIG. 6 is a flowchart showing the processing of the convolutional layer circuitization unit 501. The operation of the convolutional layer circuitization section 501 will be described below with reference to FIG.
 畳み込み層回路化部501は、まず、ネットワーク構造解析部201から受け取るニューラルネットワーク内の各層の演算情報の中から、ひとつの畳み込み層の情報を取得する(ステップS601)。そして、畳み込み層回路化部501は、取得した情報に含まれるパラメータを確認し、当該パラメータの情報を当該畳み込み層に関連付けて畳み込み層情報として保存する(ステップS602)。 The convolutional layer circuitization unit 501 first obtains information on one convolutional layer from among the calculation information for each layer in the neural network received from the network structure analysis unit 201 (step S601). The convolutional layer circuitization unit 501 then checks the parameters included in the acquired information, associates the parameter information with the convolutional layer, and stores it as convolutional layer information (step S602).
 続いて、畳み込み層回路化部501は、ニューラルネットワーク内の各層の演算情報の中に、まだ情報を取得していない畳み込み層が残っているか確認する(ステップS603)。情報を取得していない畳み込み層が残っていれば、ステップS601へ戻り、畳み込み層回路化部501は、次の畳み込み層に対してステップS601,S602の処理を行う。 Next, the convolutional layer circuitization unit 501 checks whether there are any convolutional layers whose information has not yet been acquired in the calculation information of each layer in the neural network (step S603). If there remains a convolutional layer for which information has not been acquired, the process returns to step S601, and the convolutional layer circuitization unit 501 performs the processes of steps S601 and S602 for the next convolutional layer.
 全ての畳み込み層の情報を取得済みであれば、畳み込み層回路化部501は、保存された畳み込み層情報に対応する畳み込み層のうちから、同一のパラメータを持つ層を抽出し、同一のパラメータを持つ層をグループ化する(ステップS604)。言い換えれば、畳み込み層回路化部501は、畳み込み層を、それが持つパラメータごとにグループ分けする。ここでは、グルーピングする単位を層単位としたが、層内の畳み込み演算を構成する積和演算をさらに小さなグルーピングの単位としてもよい。 If the information of all convolutional layers has been acquired, the convolutional layer circuitization unit 501 extracts layers with the same parameters from among the convolutional layers corresponding to the saved convolutional layer information, and extracts the layers with the same parameters. The layers that have the same name are grouped (step S604). In other words, the convolution layer circuitization unit 501 groups the convolution layers according to the parameters they have. Here, the grouping unit is a layer unit, but a product-sum operation that constitutes a convolution operation within a layer may be used as an even smaller grouping unit.
 ステップS604のグループ化が完了すると、畳み込み層回路化部501は、作成されたグループ数が予め定められた数以上であるか確認する(ステップS605)。グループ数が予め定められた数以上ある場合は、畳み込み層回路化部501は、さらに、類似するパラメータを持つグループをグループ化する(ステップS606)。ここでいう「類似」とは、パラメータ同士が一定範囲内で近接する関係として定義されてもよいし、一方のグループのパラメータがもう一方のグループのパラメータを含むという包含関係として定義されてもよい。パラメータが包含関係にあるグループをグループ化するのは、包含する側の演算を回路化しておけば、包含される側の演算は包含する側の演算の回路の一部を使用して実行できることによる。なお、グループ化の結果を保存する際には、各グループに属する演算がニューラルネットワーク全体において何回使用されるかという情報を関連付けて保存する。 When the grouping in step S604 is completed, the convolutional layer circuitization unit 501 checks whether the number of created groups is greater than or equal to a predetermined number (step S605). If the number of groups is greater than or equal to a predetermined number, the convolutional layer circuitization unit 501 further groups groups having similar parameters (step S606). "Similar" here may be defined as a relationship in which parameters are close to each other within a certain range, or may be defined as an inclusive relationship in which parameters in one group include parameters in the other group. . The reason why groups whose parameters have an inclusive relationship is grouped is that if the containing operation is made into a circuit, the included operation can be executed using part of the circuit of the containing operation. . Note that when saving the grouping results, information about how many times the calculations belonging to each group are used in the entire neural network is associated and saved.
 図7は、活性化層回路化部502の処理を示すフローチャートである。以下、図7を参照しつつ、活性化層回路化部502の動作を説明する。 FIG. 7 is a flowchart showing the processing of the activation layer circuitization section 502. The operation of the activation layer circuitization section 502 will be described below with reference to FIG.
 活性化層回路化部502は、まず、ネットワーク構造解析部201から受け取るニューラルネットワーク内の各層の演算情報に含まれる各層の活性化関数を取得し、活性化関数情報として保存する(ステップS701)。そして、活性化層回路化部502は、保存した活性化関数の中から同一の活性化関数を抽出し、同一の活性化関数をグループ化する(ステップS702)。言い換えれば、活性化層回路化部502は、活性化関数に基づく処理を行う層を、活性化関数ごとにグループ分けする。なお、グループ化の結果を保存する際には、各グループに属する演算がニューラルネットワーク全体において何回使用されるかという情報を関連付けて保存する。 The activation layer circuitization unit 502 first obtains the activation function of each layer included in the calculation information of each layer in the neural network received from the network structure analysis unit 201, and stores it as activation function information (step S701). Then, the activation layer circuitization unit 502 extracts the same activation functions from among the saved activation functions, and groups the same activation functions (step S702). In other words, the activation layer circuitization unit 502 groups layers that perform processing based on activation functions for each activation function. Note that when saving the grouping results, information about how many times the calculations belonging to each group are used in the entire neural network is associated and saved.
 次に、活性化層回路化部502は、グループ化された活性化関数のうちのひとつを取得し(ステップS703)、当該活性化関数が線形近似可能かどうかを確認する(ステップS704)。当該活性化関数が線形近似可能であれば、線形近似された関数を当該活性化関数に関連付けて保存する(ステップS705)。 Next, the activation layer circuitization unit 502 obtains one of the grouped activation functions (step S703), and checks whether the activation function can be linearly approximated (step S704). If the activation function can be linearly approximated, the linearly approximated function is stored in association with the activation function (step S705).
 続いて、活性化層回路化部502は、グループ化された活性化関数の中に、まだ取得していない活性化関数が残っているか確認する(ステップS706)。まだ取得していない活性化関数が残っていれば、ステップS703へ戻り、畳み込み層回路化部501は、次の活性化関数に対してステップS703~S706の処理を行う。つまり、畳み込み層回路化部501は、ステップS703~S706の処理を、グループ化された活性化関数の全てに対して行う。 Next, the activation layer circuitization unit 502 checks whether there are any activation functions that have not been obtained yet among the grouped activation functions (step S706). If there remains an activation function that has not yet been acquired, the process returns to step S703, and the convolutional layer circuitization unit 501 performs the processes of steps S703 to S706 for the next activation function. In other words, the convolutional layer circuitization unit 501 performs the processing of steps S703 to S706 on all of the grouped activation functions.
 回路規模算出部503は、畳み込み層回路化部501によってグループ化された畳み込み層情報、および、活性化層回路化部502によってグループ化された活性化関数情報のそれぞれのグループにつき、回路化した際の規模を算出する。回路化した際の規模は、例えば、回路に含まれる加算器や乗算器などの演算器の数に基づいて算出することができる。 The circuit scale calculation unit 503 calculates the size when circuitizing each group of convolutional layer information grouped by the convolutional layer circuitization unit 501 and activation function information grouped by the activation layer circuitization unit 502. Calculate the scale of The scale of the circuit can be calculated, for example, based on the number of arithmetic units such as adders and multipliers included in the circuit.
 図8は、回路規模算出部503の処理を示すフローチャートである。以下、図8を参照しつつ、回路規模算出部503の動作を説明する。 FIG. 8 is a flowchart showing the processing of the circuit scale calculation unit 503. The operation of the circuit scale calculation unit 503 will be described below with reference to FIG.
 回路規模算出部503は、まず、グループ化された畳み込み層情報に含まれる畳み込み層のグループをひとつ取得する(ステップS801)。そして、回路規模算出部503は、取得した畳み込み層に関連付けて保存されているパラメータに基づいて、当該畳み込み層の演算の回路化に必要な演算器を決定する(ステップS802)。そして回路規模算出部503は、決定した演算器の種類と個数から、当該畳み込み層の演算を回路化した場合の回路規模を算出する(ステップS803)。 The circuit scale calculation unit 503 first obtains one group of convolutional layers included in the grouped convolutional layer information (step S801). Then, the circuit scale calculation unit 503 determines a computing unit necessary for circuitizing the computation of the convolutional layer based on the parameters stored in association with the acquired convolutional layer (step S802). Then, the circuit scale calculation unit 503 calculates the circuit scale when the computation of the convolution layer is circuitized from the determined type and number of arithmetic units (step S803).
 なお、演算の回路化において選択される演算器は、当該演算を実行できる最小規模の演算器である必要はなく、あえて大きな回路規模の演算器が選択されてもよい。これは、回路規模の大きさが2の累乗であればデータアクセス面で効率的であるため、演算器の回路規模の大きさが2の累乗となるように、意図的に演算器の回路規模を大きくすることによる。 It should be noted that the arithmetic unit selected in circuitizing the arithmetic operation does not have to be the smallest arithmetic unit that can execute the arithmetic operation, and an arithmetic unit with a large circuit scale may be selected. This is because it is efficient in terms of data access if the circuit size is a power of 2, so the circuit size of the arithmetic unit is intentionally set so that the circuit size of the arithmetic unit is a power of 2. By increasing .
 その後、回路規模算出部503は、グループ化された畳み込み層情報の中に、まだ取得していない畳み込み層のグループが残っているか確認する(ステップS804)。まだ取得していない畳み込み層のグループが残っていれば、ステップS801へ戻り、回路規模算出部503は、次のグループに対してステップS801~S803の処理を行う。つまり、回路規模算出部503は、ステップS801~S803の処理を、畳み込み層のグループの全てに対して行う。 After that, the circuit scale calculation unit 503 checks whether there remains a group of convolutional layers that has not been acquired yet in the grouped convolutional layer information (step S804). If there remains a convolutional layer group that has not been acquired yet, the process returns to step S801, and the circuit scale calculation unit 503 performs the processes of steps S801 to S803 on the next group. In other words, the circuit scale calculation unit 503 performs the processing in steps S801 to S803 for all groups of convolutional layers.
 全ての畳み込み層のグループを取得済みであれば、回路規模算出部503は、グループ化された活性化関数情報に含まれる、活性化層に対応する活性化関数のグループをひとつ取得する(ステップS805)。次に、回路規模算出部503は、当該活性化関数を回路化するのに必要な演算器を決定する(ステップS806)。そして、回路規模算出部503は、演算器の種類と個数から、当該活性化関数を回路化した場合の回路規模を算出する(ステップS807)。 If all convolution layer groups have been acquired, the circuit scale calculation unit 503 acquires one activation function group corresponding to the activation layer included in the grouped activation function information (step S805). ). Next, the circuit scale calculation unit 503 determines the arithmetic units necessary to circuitize the activation function (step S806). Then, the circuit scale calculation unit 503 calculates the circuit scale when the activation function is circuitized from the type and number of arithmetic units (step S807).
 続いて、回路規模算出部503は、当該活性化関数が線形近似可能かどうか判断する(ステップS808)。当該活性化関数が線形近似可能な場合は、近似関数を特定し(ステップS809)、近似関数の回路化に必要な演算器を決定し(ステップS810)、決定した演算器の種類と個数から回路規模を算出し(ステップS811)、算出した回路規模を当該活性化関数に関連付けて保存する。 Subsequently, the circuit scale calculation unit 503 determines whether the activation function can be linearly approximated (step S808). If the activation function can be linearly approximated, the approximation function is specified (step S809), the arithmetic units necessary for circuitizing the approximation function are determined (step S810), and the circuit is constructed based on the type and number of the determined arithmetic units. The scale is calculated (step S811), and the calculated circuit scale is stored in association with the activation function.
 その後、回路規模算出部503は、グループ化された活性化関数情報の中に、まだ取得していない活性化関数のグループが残っているか確認する(ステップS812)。まだ取得していない活性化関数のグループが残っていれば、ステップS805へ戻り、回路規模算出部503は、次のグループに対してステップS805~S812の処理を行う。つまり、回路規模算出部503は、ステップS805~S812の処理を、活性化関数のグループの全てに対して行う。 After that, the circuit scale calculation unit 503 checks whether there are any groups of activation functions that have not been acquired yet in the grouped activation function information (step S812). If there remains a group of activation functions that have not yet been obtained, the process returns to step S805, and the circuit scale calculation unit 503 performs the processes of steps S805 to S812 for the next group. In other words, the circuit scale calculation unit 503 performs the processing in steps S805 to S812 for all groups of activation functions.
 回路化箇所決定部504は、回路規模算出部503から、ニューラルネットワークを処理するのに必要な回路の一覧と、その回路規模の情報とを受け取り、実装先のFPGAの容量を考慮して、ニューラルネットワークの処理のうち回路化すべき処理を決定する。このとき回路化しないと決定された処理は、CPU等のプロセッサ上でソフトウェア処理されることとなる。回路化する処理の選定基準(つまり、各処理を回路化するかどうかの判断基準)としては、ニューラルネットワーク処理の実行時間が小さく(短く)なるかどうか、また、ニューラルネットワークの精度が高くなるかどうか、という二つの要素が勘案される。 The circuitization location determination unit 504 receives a list of circuits required to process the neural network and information on the circuit scale from the circuit scale calculation unit 503, and takes into account the capacity of the FPGA to be mounted, and configures the neural network. Determine which network processes should be circuitized. At this time, the processing determined not to be circuitized will be processed by software on a processor such as a CPU. The criteria for selecting processes to be circuitized (in other words, the criteria for deciding whether to circuitize each process) are whether the execution time of the neural network processing will be small (shortened), and whether the accuracy of the neural network will be high. Two factors are taken into consideration:
 通常、ニューラルネットワーク処理の実行時間は、各処理をソフトウェア処理するよりも回路化した方が小さくなる。そのため、ニューラルネットワーク処理の実行時間を選定基準にして回路化する処理を選定する場合、回路規模算出部503から受け取ったニューラルネットワークを処理するのに必要な回路の一覧のうち、ソフトウェア処理した場合の処理時間が大きく(長く)ものほど優先的に回路化すべきと判断される。このとき、各回路がニューラルネットワークの処理の中で使用される回数も考慮される。例えば、処理Aをソフトウェア処理した場合の実行時間が5ミリ秒、処理Bをソフトウェア処理した場合の実行時間が30ミリ秒であったとしても、ニューラルネットワークにおける処理Aの使用回数が20回、処理Bの使用回数2回であれば、ニューラルネットワーク全体で見ると処理Aの処理時間(5ミリ秒×20)の方が処理Bの処理時間(30ミリ秒×2)がよりも長いため、処理Aが処理Bよりも優先的に回路化すべきと判断される。 Normally, the execution time of neural network processing is shorter when each process is implemented as a circuit than when processed using software. Therefore, when selecting processing to be circuitized based on the execution time of neural network processing as a selection criterion, among the list of circuits required to process the neural network received from the circuit size calculation unit 503, when processing is performed by software, It is determined that the larger (longer) the processing time, the more preferentially it should be converted into a circuit. At this time, the number of times each circuit is used during neural network processing is also taken into consideration. For example, even if the execution time when processing A is processed by software is 5 milliseconds, and the execution time is 30 milliseconds when processing B is processed by software, the number of times processing A is used in the neural network is 20 times. If B is used twice, the processing time of processing A (5 ms x 20) is longer than the processing time of processing B (30 ms x 2) when looking at the entire neural network, so processing It is determined that process A should be circuitized with priority over process B.
 一方、ニューラルネットワークの精度は、各処理を回路化してもソフトウェア処理しても基本的に同じである。ただし、活性化関数の処理を線形近似関数を用いて行った場合は、精度が低下する可能性がある。そのため、活性化関数の処理を回路化するかどうかの判断は、回路化による実行時間の短縮の度合いと精度の低下の度合いとの両方を考慮する必要がある。線形近似による精度の低下については、活性化関数を線形近似することによって生じる、ニューラルネットワーク内で取りうる関数入力の定義域における値域の誤差(線形近似する場合としない場合との差異)が大きいほど、精度の低下の度合いが大きいといえる。また、ニューラルネットワーク内で使用される回数が多い活性化関数ほど、当該活性化関数を線形近似したときの精度の低下は大きくなるといえる。さらに、活性化関数がニューラルネットワーク内で使用される箇所も、当該活性化関数を線形近似したときの精度の低下の度合いに影響する。例えば、ニューラルネットワークの最初の層である入力層に近い箇所の活性化関数は、ニューラルネットワークへの入力データに直接影響を与えるため、線形近似することで生じる精度の低下は大きく、後段の層の活性化関数は、特徴量抽出後の処理に用いられるため、線形近似することで生じる精度の低下は小さいと考えられる。ただし、必ずしも、入力層に近い箇所の活性化関数ほど線形近似による精度の低下が大きいと評価しなくてもよい。回路化箇所決定部504は、これらを総合して、ニューラルネットワークの精度の低下の度合いを評価することとなる。例えば、回路化箇所決定部504は、活性化関数を線形近似することによって生じる誤差と、当該活性化関数のニューラルネットワーク内での使用回数との積を、当該活性化関数の線形近似による精度低下の度合いとして算出してもよい。 On the other hand, the accuracy of neural networks is basically the same whether each process is implemented as a circuit or processed using software. However, if the activation function is processed using a linear approximation function, the accuracy may decrease. Therefore, when deciding whether to circuitize activation function processing, it is necessary to consider both the degree of reduction in execution time and the degree of decrease in accuracy due to circuitization. Regarding the decrease in accuracy due to linear approximation, the larger the error (difference between linear approximation and non-linear approximation) in the range of function inputs that can be taken within the neural network, which is caused by linear approximation of the activation function, , it can be said that the degree of decrease in accuracy is large. Furthermore, it can be said that the more times an activation function is used within a neural network, the greater the decrease in accuracy when linearly approximating the activation function. Furthermore, the location where the activation function is used within the neural network also affects the degree of accuracy reduction when linearly approximating the activation function. For example, the activation function near the input layer, which is the first layer of a neural network, directly affects the input data to the neural network, so the decrease in accuracy caused by linear approximation is large, and Since the activation function is used in processing after feature extraction, it is thought that the decrease in accuracy caused by linear approximation is small. However, it is not necessarily necessary to evaluate that the closer the activation function is to the input layer, the greater the decrease in accuracy due to linear approximation. The circuitization location determination unit 504 evaluates the degree of decrease in accuracy of the neural network by integrating these. For example, the circuitization location determination unit 504 calculates the product of the error caused by linear approximation of the activation function and the number of times the activation function is used within the neural network, and calculates the accuracy decrease due to the linear approximation of the activation function. It may also be calculated as the degree of
 また、FPGAの容量を超えての回路化は実現できないため、FPGAの容量という制約により回路化できる処理には上限が設定される。一方で、ニューラルネットワーク処理の実行時間は、より多くの処理が回路化されるほど短縮の度合いは大きくなる。さらに、非線形処理を含む活性化関数の処理を回路化したときの回路規模は、線形処理のみの活性化関数の処理を回路化したときの回路規模に比べて大きくなるため、非線形処理を含む活性化関数の処理を回路化すると、FPGAの容量の制約により、回路化できる処理の数を減らしてしまう。活性化関数を線形近似して回路化すれば、この問題を回避することができるが、線形近似による精度の低下が生じる。つまり、ニューラルネットワーク処理の実行時間(すなわち回路化する処理の数)とニューラルネットワークの精度とは、トレードオフの関係にある。そのため、回路化箇所決定部504は、ニューラルネットワーク処理の実行時間とニューラルネットワークの精度という相反する要素のバランスを考えて、回路化する箇所を決定する。例えば、ニューラルネットワーク処理の実行時間とニューラルネットワークの精度との重み付き線形和を評価関数とし、当該評価関数の値が最小となるよう回路化する箇所を決定するといった手法が考えられる。このとき、ニューラルネットワーク処理の実行時間の上限、ニューラルネットワークの精度の下限、CPUとFPGAとの間で必要となる共有メモリの容量やデータ転送バスの帯域幅の上限などが、制約条件として加えられてもよい。また、極端な実行時間の増加や精度の低下は好ましくないため、実行時間の増加や精度の低下についてペナルティーを科すような非線形関数を評価関数としてもよい。 Furthermore, since it is not possible to implement circuits that exceed the capacity of the FPGA, an upper limit is set on the processing that can be implemented as a circuit due to the constraint of the FPGA capacity. On the other hand, the execution time of neural network processing is reduced more and more as more processing is implemented in circuits. Furthermore, since the circuit scale when processing the activation function including nonlinear processing is larger than the circuit scale when processing the activation function including linear processing only, If the processing of the conversion function is implemented as a circuit, the number of processes that can be implemented as a circuit will be reduced due to the capacity constraints of the FPGA. This problem can be avoided by linearly approximating the activation function and creating a circuit, but the linear approximation causes a decrease in accuracy. In other words, there is a trade-off relationship between the execution time of neural network processing (ie, the number of processes to be circuitized) and the accuracy of the neural network. Therefore, the circuitization location determination unit 504 determines the circuitization location by considering the balance between contradictory factors such as the execution time of neural network processing and the accuracy of the neural network. For example, a method can be considered in which a weighted linear sum of the execution time of neural network processing and the accuracy of the neural network is used as an evaluation function, and a portion to be circuitized is determined so that the value of the evaluation function is minimized. At this time, constraints such as the upper limit of the execution time of neural network processing, the lower limit of neural network accuracy, the capacity of shared memory required between the CPU and FPGA, and the upper limit of the bandwidth of the data transfer bus are added. It's okay. Furthermore, since extreme increases in execution time and decreases in accuracy are undesirable, a nonlinear function that imposes a penalty on increases in execution time and decreases in accuracy may be used as the evaluation function.
 このように、実施の形態1に係るニューラルネットワーク装置100によれば、処理の回路化によるニューラルネットワーク処理の実行時間の短縮の度合いと精度の低下の度合いとを考慮して、ニューラルネットワークの各処理をFPGA上で回路化するかCPU等のプロセッサでソフトウェア処理するかを判断することで、ニューラルネットワーク処理の設計が行われる。これにより、プロセッサとFPGA上の回路とが連携して処理を行うニューラルネットワークを設計でき、小さなリソースのハードウェア上で高速な処理を行うことができるニューラルネットワークを実現することが可能となる。 As described above, according to the neural network device 100 according to the first embodiment, each process of the neural network is performed in consideration of the degree of shortening of the execution time of the neural network process and the degree of decrease in accuracy due to the circuitization of the process. Neural network processing is designed by determining whether to circuitize it on an FPGA or process it in software using a processor such as a CPU. This makes it possible to design a neural network in which a processor and a circuit on an FPGA perform processing in cooperation with each other, and to realize a neural network that can perform high-speed processing on hardware with small resources.
 図9および図10は、それぞれニューラルネットワーク構築部101のハードウェア構成の例を示す図である。図1に示したニューラルネットワーク構築部101の構成要素の各機能は、例えば図9に示す処理回路10により実現される。すなわち、ニューラルネットワーク構築部101は、ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定し、回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力するための処理回路10を備える。処理回路10は、専用のハードウェアであってもよいし、メモリに格納されたプログラムを実行するプロセッサ(中央処理装置(CPU:Central Processing Unit)、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)とも呼ばれる)を用いて構成されていてもよい。 9 and 10 are diagrams each showing an example of the hardware configuration of the neural network construction unit 101. Each function of the constituent elements of the neural network construction unit 101 shown in FIG. 1 is realized, for example, by the processing circuit 10 shown in FIG. 9. That is, the neural network construction unit 101 determines the calculation method for each calculation constituting the neural network, whether to convert the calculation into a circuit or to process it by software, and constructs a circuit for converting the calculation determined to be circuitized into a circuit. A processing circuit 10 is provided for creating and outputting a program for software processing information and calculations determined to be processed by software. The processing circuit 10 may be dedicated hardware, or may include a processor (Central Processing Unit (CPU), processing device, arithmetic device, microprocessor, microcomputer, etc.) that executes a program stored in memory. It may be configured using a DSP (also called Digital Signal Processor).
 処理回路10が専用のハードウェアである場合、処理回路10は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)、またはこれらを組み合わせたものなどが該当する。ニューラルネットワーク構築部101の構成要素の各々の機能が個別の処理回路で実現されてもよいし、それらの機能がまとめて一つの処理回路で実現されてもよい。 When the processing circuit 10 is dedicated hardware, the processing circuit 10 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Circuit). Gate Array), or a combination of these. The functions of each component of the neural network construction unit 101 may be realized by individual processing circuits, or these functions may be realized collectively by one processing circuit.
 図10は、処理回路10がプログラムを実行するプロセッサ11を用いて構成されている場合におけるニューラルネットワーク構築部101のハードウェア構成の例を示している。この場合、ニューラルネットワーク構築部101の構成要素の機能は、ソフトウェア等(ソフトウェア、ファームウェア、またはソフトウェアとファームウェアとの組み合わせ)により実現される。ソフトウェア等はプログラムとして記述され、メモリ12に格納される。プロセッサ11は、メモリ12に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、ニューラルネットワーク構築部101は、プロセッサ11により実行されるときに、ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定する処理と、回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力する処理と、が結果的に実行されることになるプログラムを格納するためのメモリ12を備える。換言すれば、このプログラムは、ニューラルネットワーク構築部101の構成要素の動作の手順や方法をコンピュータに実行させるものであるともいえる。 FIG. 10 shows an example of the hardware configuration of the neural network construction unit 101 in a case where the processing circuit 10 is configured using a processor 11 that executes a program. In this case, the functions of the constituent elements of the neural network construction unit 101 are realized by software or the like (software, firmware, or a combination of software and firmware). Software and the like are written as programs and stored in the memory 12. The processor 11 implements the functions of each section by reading and executing programs stored in the memory 12. That is, when executed by the processor 11, the neural network construction unit 101 determines the calculation method for each calculation that constitutes the neural network, whether to convert the calculation into a circuit or to process it by software, and determines whether to convert the calculation into a circuit. A process for creating and outputting circuit information for converting the determined operation into a circuit and a program for software processing the determined operation, and storing the program that will be executed as a result. A memory 12 is provided for the purpose. In other words, this program can be said to cause a computer to execute the procedures and methods for the operations of the components of the neural network construction unit 101.
 ここで、メモリ12は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically Erasable Programmable Read Only Memory)などの、不揮発性または揮発性の半導体メモリ、HDD(Hard Disk Drive)、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD(Digital Versatile Disc)およびそのドライブ装置のほか、今後使用されるあらゆる記憶媒体であってもよい。 Here, the memory 12 is, for example, a non-volatile or Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc) and their drive devices, as well as all storage media that will be used in the future. Good too.
 以上、ニューラルネットワーク構築部101の構成要素の機能が、ハードウェアおよびソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、ニューラルネットワーク構築部101の一部の構成要素を専用のハードウェアで実現し、別の一部の構成要素をソフトウェア等で実現する構成であってもよい。例えば、一部の構成要素については専用のハードウェアとしての処理回路10でその機能を実現し、他の一部の構成要素についてはプロセッサ11としての処理回路10がメモリ12に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。 The above describes the configuration in which the functions of the components of the neural network construction unit 101 are realized by either hardware, software, or the like. However, the present invention is not limited to this, and a configuration may be adopted in which some of the components of the neural network construction unit 101 are implemented by dedicated hardware, and other components are implemented by software or the like. For example, for some components, the functions are realized by the processing circuit 10 as dedicated hardware, and for some other components, the processing circuit 10 as the processor 11 executes a program stored in the memory 12. The function can be realized by reading and executing it.
 以上のように、ニューラルネットワーク構築部101は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。 As described above, the neural network construction unit 101 can realize each of the above functions using hardware, software, etc., or a combination thereof.
 <実施の形態2>
 図11は、実施の形態2に係るニューラルネットワーク装置100の構成を示すブロック図である。図11においては、実施の形態1(図1)に示したものと同一または同等の要素には同一の符号を付しており、ここではそれらの説明は省略する。
<Embodiment 2>
FIG. 11 is a block diagram showing the configuration of neural network device 100 according to the second embodiment. In FIG. 11, the same or equivalent elements as those shown in Embodiment 1 (FIG. 1) are denoted by the same reference numerals, and their description will be omitted here.
 図11のように、実施の形態2に係るニューラルネットワーク装置100は、ニューラルネットワーク構築部101に加え、ニューラルネットワーク実行部901を備えている。ニューラルネットワーク実行部901は、記憶部905と、CPU902と、FPGA903と、メモリ904と、データ取得回路906とを有する。 As shown in FIG. 11, the neural network device 100 according to the second embodiment includes a neural network execution unit 901 in addition to the neural network construction unit 101. The neural network execution unit 901 includes a storage unit 905, a CPU 902, an FPGA 903, a memory 904, and a data acquisition circuit 906.
 ニューラルネットワーク実行部901は、ニューラルネットワーク構築部101が作成したプログラムおよび回路情報に基づき、CPU902とFPGA903とが連携して処理を行うニューラルネットワークの演算処理を実行する。 The neural network execution unit 901 executes neural network arithmetic processing in which the CPU 902 and the FPGA 903 work together based on the program and circuit information created by the neural network construction unit 101.
 記憶部905は、ニューラルネットワーク構築部101が作成したプログラムおよび回路情報を記憶する。CPU902は、記憶部905に記憶されたプログラムを読み取り、当該プログラムに基づいて、CPU902に割り当てられたニューラルネットワークの演算処理ならびにFPGA903の制御を行う。FPGA903は、記憶部905に記憶された回路情報を読み取り、当該回路情報に基づいて、演算回路を構成し、FPGA903に割り当てられたニューラルネットワークの演算処理を行う。 The storage unit 905 stores the program and circuit information created by the neural network construction unit 101. The CPU 902 reads a program stored in the storage unit 905, and performs arithmetic processing of the neural network assigned to the CPU 902 and controls the FPGA 903 based on the program. The FPGA 903 reads circuit information stored in the storage unit 905, configures an arithmetic circuit based on the circuit information, and performs arithmetic processing of a neural network assigned to the FPGA 903.
 メモリ904は、CPU902とFPGA903の間でやり取りされるデータを中継するためのものである。より具体的には、CPU902が、FPGA903上に構築された回路を用いた演算の入力データをメモリ904に格納し、FPGA903が、この入力データを読み出して回路上の演算に用いる。また、FPGA903が、その演算結果をメモリ904に格納し、CPU902が、その演算結果をメモリ904から読み出してソフトウェア処理に用いる。 The memory 904 is for relaying data exchanged between the CPU 902 and the FPGA 903. More specifically, the CPU 902 stores input data for calculations using a circuit built on the FPGA 903 in the memory 904, and the FPGA 903 reads this input data and uses it for calculations on the circuit. Further, the FPGA 903 stores the calculation result in the memory 904, and the CPU 902 reads the calculation result from the memory 904 and uses it for software processing.
 データ取得回路906は、メモリ904からFPGA903がデータを読み出す際に利用する回路である。本実施の形態では、データ取得回路906は、FPGA903上に演算回路の1つとして構築されている。 The data acquisition circuit 906 is a circuit used when the FPGA 903 reads data from the memory 904. In this embodiment, data acquisition circuit 906 is constructed as one of the calculation circuits on FPGA 903.
 一般的に、FPGAが外部のメモリからデータを読み出して演算を行う場合、必要なデータがメモリの予め定められた位置に格納された旨の通知を受けて、データの取得を開始する。データ取得回路906は、この通知処理を省略するためのものである。具体的には、データ取得回路906は、FPGA903上の回路それぞれの入力データのサイズを予め定めておき、そのサイズのデータがメモリ904上に揃った段階で当該データを自動的にFPGA903に転送する。入力データのサイズは、ニューラルネットワーク構築部101が回路構成を決定する段階で確定するため、ニューラルネットワーク構築部101のデータ取得回路用制御データ生成部303で算出することができる。ニューラルネットワーク構築部101は、データ取得回路用制御データ生成部303が算出した各回路の入力データのサイズを回路情報に含ませて、記憶部905に格納する。 Generally, when an FPGA reads data from an external memory and performs an operation, it starts acquiring the data upon receiving notification that the necessary data has been stored in a predetermined location in the memory. The data acquisition circuit 906 is provided to omit this notification process. Specifically, the data acquisition circuit 906 predetermines the size of input data for each circuit on the FPGA 903, and automatically transfers the data to the FPGA 903 when data of that size is available on the memory 904. . The size of the input data is determined when the neural network construction unit 101 determines the circuit configuration, and therefore can be calculated by the data acquisition circuit control data generation unit 303 of the neural network construction unit 101. The neural network construction unit 101 includes the size of the input data of each circuit calculated by the data acquisition circuit control data generation unit 303 in the circuit information, and stores the circuit information in the storage unit 905.
 なお、図11においては、CPU902、FPGA903およびメモリ904は、それぞれ別のブロックとして示されているが、これらすべてを搭載した1チップのSoC(System-on-a-Chip)で構成されていてもよい。 Note that although the CPU 902, FPGA 903, and memory 904 are shown as separate blocks in FIG. good.
 また、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略したりすることが可能である。 Furthermore, it is possible to freely combine each embodiment, or to modify or omit each embodiment as appropriate.
 上記した説明は、すべての態様において、例示であって、例示されていない無数の変形例が想定され得るものと解される。 The above description is understood to be illustrative in all aspects, and countless variations not exemplified can be envisioned.
 100 ニューラルネットワーク装置、101 ニューラルネットワーク構築部、102 ニューラルネットワーク解析部、103 ニューラルネットワーク演算方式出力部、104 記憶部、201 ネットワーク構造解析部、202 ニューラルネットワーク分割部、301 制御プログラム作成部、302 演算回路作成部、303 データ取得回路用制御データ生成部、401 演算構造分類部、402 畳み込み層分析部、403 活性化層分析部、501 畳み込み層回路化部、502 活性化層回路化部、503 回路規模算出部、504 回路化箇所決定部、901 ニューラルネットワーク実行部、902 CPU、903 FPGA、904 メモリ、905 記憶部、906 データ取得回路、10 処理回路、11 プロセッサ、12 メモリ。 100 neural network device, 101 neural network construction unit, 102 neural network analysis unit, 103 neural network calculation method output unit, 104 storage unit, 201 network structure analysis unit, 202 neural network division unit, 301 control program creation unit, 302 arithmetic circuit Creation unit, 303 Control data generation unit for data acquisition circuit, 401 Arithmetic structure classification unit, 402 Convolution layer analysis unit, 403 Activation layer analysis unit, 501 Convolution layer circuitization unit, 502 Activation layer circuitization unit, 503 Circuit scale Calculation unit, 504 circuitization location determination unit, 901 neural network execution unit, 902 CPU, 903 FPGA, 904 memory, 905 storage unit, 906 data acquisition circuit, 10 processing circuit, 11 processor, 12 memory.

Claims (6)

  1.  ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定するニューラルネットワーク解析部と、
     回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力するニューラルネットワーク演算方式出力部と、
    を備えるニューラルネットワーク装置であって、
     前記ニューラルネットワーク解析部は、
     前記ニューラルネットワークの演算構造を解析するネットワーク構造解析部と、
     前記ニューラルネットワークを分割して得られる前記各演算について、当該演算を回路化するかソフトウェア処理するかを決定するニューラルネットワーク分割部と、を有し、
     前記ネットワーク構造解析部は、
     前記ニューラルネットワークの各層を、当該層を構成する演算の種類に応じて分類する演算構造分類部と、
     前記演算構造分類部によって畳み込み演算を行う層に分類された層について、前記畳み込み演算のパラメータを特定する畳み込み層分析部と、を有し、
     前記ニューラルネットワーク分割部は、
     前記畳み込み層分析部が特定したパラメータに基づいて、同一または類似するパラメータを持つ層をグループ化する畳み込み層回路化部と、
     前記畳み込み層回路化部によりグループ化された層の畳み込み演算それぞれについて、当該畳み込み演算を回路化した場合の回路規模を算出する回路規模算出部と、
     前記回路規模算出部が算出した前記回路規模に基づいて、回路化する演算を決定する回路化箇所決定部と、を有する、
    ニューラルネットワーク装置。
    a neural network analysis unit that determines the calculation method for each calculation constituting the neural network, such as whether to implement the calculation in a circuit or process it in software;
    a neural network calculation method output unit that creates and outputs circuit information for circuitizing the calculations determined to be circuitized and a program for software processing the calculations determined to be processed by software;
    A neural network device comprising:
    The neural network analysis unit includes:
    a network structure analysis unit that analyzes the calculation structure of the neural network;
    a neural network dividing unit that determines whether to perform circuit processing or software processing for each of the calculations obtained by dividing the neural network;
    The network structure analysis unit includes:
    a computation structure classification unit that classifies each layer of the neural network according to the type of computation constituting the layer;
    a convolution layer analysis unit that specifies parameters of the convolution operation for a layer classified as a layer for performing a convolution operation by the operation structure classification unit;
    The neural network dividing unit is
    a convolutional layer circuitization unit that groups layers having the same or similar parameters based on the parameters specified by the convolutional layer analysis unit;
    a circuit scale calculation unit that calculates a circuit scale when the convolution operation is circuitized for each of the convolution operations of the layers grouped by the convolution layer circuitization unit;
    a circuitization location determining unit that determines an operation to be circuitized based on the circuit scale calculated by the circuit scale calculation unit;
    Neural network device.
  2.  前記ニューラルネットワーク演算方式出力部は、
     前記回路情報を作成する演算回路作成部と、
     前記プログラムを作成する制御プログラム作成部と、を有し、
     前記制御プログラム作成部が作成する前記プログラムには、前記回路情報に基づいて構築される演算回路の入出力を管理するための制御プログラムが含まれる、
    請求項1に記載のニューラルネットワーク装置。
    The neural network calculation method output section includes:
    an arithmetic circuit creation unit that creates the circuit information;
    a control program creation unit that creates the program;
    The program created by the control program creation unit includes a control program for managing input and output of an arithmetic circuit constructed based on the circuit information.
    The neural network device according to claim 1.
  3.  前記ネットワーク構造解析部は、
     前記演算構造分類部によって活性化関数に基づく処理を行う層に分類された層について、各層で用いられる活性化関数を特定する活性化層分析部をさらに有し、
     前記ニューラルネットワーク分割部は、
     前記演算構造分類部によって活性化関数に基づく処理を行う層に分類された各層で用いられる活性化関数のうち同一の活性化関数をグループ化し、グループ化された活性化関数のうち線形近似可能なものを線形近似して得られる線形近似関数を、当該活性化関数に関連付けて保存する活性化層回路化部をさらに有し、
     前記回路規模算出部は、前記グループ化された前記活性化関数および前記線形近似関数のそれぞれを回路化した場合の回路規模をさらに算出する、
    請求項1または請求項2に記載のニューラルネットワーク装置。
    The network structure analysis unit includes:
    further comprising an activation layer analysis unit that identifies an activation function used in each layer for the layers classified by the arithmetic structure classification unit as layers that perform processing based on activation functions;
    The neural network dividing unit is
    Among the activation functions used in each layer classified by the calculation structure classification unit into layers that perform processing based on activation functions, the same activation functions are grouped, and among the grouped activation functions, linear approximation is possible. further comprising an activation layer circuitization unit that stores a linear approximation function obtained by linearly approximating the object in association with the activation function,
    The circuit size calculation unit further calculates a circuit size when each of the grouped activation functions and the linear approximation functions is circuitized.
    The neural network device according to claim 1 or claim 2.
  4.  前記ニューラルネットワーク装置は、
     前記ニューラルネットワーク演算方式出力部が出力する前記回路情報および前記プログラムに基づいて、ニューラルネットワーク処理を実行するニューラルネットワーク実行部をさらに備え、
     前記ニューラルネットワーク実行部は、
     前記回路情報および前記プログラムを記憶する記憶部と、
     前記プログラムを実行するCPUと、
     前記回路情報に基づいて演算回路を構築し、前記演算回路による演算を実行するFPGAと、
     前記CPUと前記FPGAとの間でデータを中継するためのメモリと、
    を備える、
    請求項1から請求項3のいずれか一項に記載のニューラルネットワーク装置。
    The neural network device includes:
    further comprising a neural network execution unit that executes neural network processing based on the circuit information and the program output by the neural network calculation method output unit,
    The neural network execution unit includes:
    a storage unit that stores the circuit information and the program;
    a CPU that executes the program;
    an FPGA that constructs an arithmetic circuit based on the circuit information and executes an operation by the arithmetic circuit;
    a memory for relaying data between the CPU and the FPGA;
    Equipped with
    A neural network device according to any one of claims 1 to 3.
  5.  前記ニューラルネットワーク実行部は、
     前記CPUから前記メモリを介して前記FPGAへ渡されるデータを、自動的に前記メモリから取得して前記FPGAに読み込ませるデータ取得回路をさらに備え、
     前記ニューラルネットワーク演算方式出力部は、
     前記回路情報に基づいて、前記データ取得回路を制御するためのデータを作成するデータ取得回路用制御データ生成部をさらに備える、
    請求項4に記載のニューラルネットワーク装置。
    The neural network execution unit includes:
    further comprising a data acquisition circuit that automatically acquires data passed from the CPU to the FPGA via the memory from the memory and reads it into the FPGA;
    The neural network calculation method output section includes:
    further comprising a data acquisition circuit control data generation unit that creates data for controlling the data acquisition circuit based on the circuit information;
    The neural network device according to claim 4.
  6.  前記データ取得回路を制御するためのデータは、前記演算回路の入力データのサイズを示すデータである
    請求項5に記載のニューラルネットワーク装置。
    6. The neural network device according to claim 5, wherein the data for controlling the data acquisition circuit is data indicating the size of input data of the arithmetic circuit.
PCT/JP2022/010523 2022-03-10 2022-03-10 Neural network device WO2023170855A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022543749A JP7179237B1 (en) 2022-03-10 2022-03-10 neural network device
PCT/JP2022/010523 WO2023170855A1 (en) 2022-03-10 2022-03-10 Neural network device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/010523 WO2023170855A1 (en) 2022-03-10 2022-03-10 Neural network device

Publications (1)

Publication Number Publication Date
WO2023170855A1 true WO2023170855A1 (en) 2023-09-14

Family

ID=84227638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/010523 WO2023170855A1 (en) 2022-03-10 2022-03-10 Neural network device

Country Status (2)

Country Link
JP (1) JP7179237B1 (en)
WO (1) WO2023170855A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013125419A (en) * 2011-12-14 2013-06-24 Fuji Xerox Co Ltd Hardware-software collaborative design device and program
US20190266504A1 (en) * 2019-05-09 2019-08-29 Intel Corporation Using computational cost and instantaneous load analysis for intelligent deployment of neural networks on multiple hardware executors
JP2020129404A (en) * 2015-10-28 2020-08-27 グーグル エルエルシー Processing computational graphs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013125419A (en) * 2011-12-14 2013-06-24 Fuji Xerox Co Ltd Hardware-software collaborative design device and program
JP2020129404A (en) * 2015-10-28 2020-08-27 グーグル エルエルシー Processing computational graphs
US20190266504A1 (en) * 2019-05-09 2019-08-29 Intel Corporation Using computational cost and instantaneous load analysis for intelligent deployment of neural networks on multiple hardware executors

Also Published As

Publication number Publication date
JPWO2023170855A1 (en) 2023-09-14
JP7179237B1 (en) 2022-11-28

Similar Documents

Publication Publication Date Title
CN111652368B (en) Data processing method and related product
WO2019216404A1 (en) Neural network construction device, information processing device, neural network construction method, and program
EP3525119B1 (en) Fpga converter for deep learning models
US7424595B2 (en) System for managing circuitry of variable function information processing circuit and method for managing circuitry of variable function information processing circuit
WO2001090887A1 (en) Method fir processing program for high-speed processing by using dynamically reconfigurable hardware and program for executing the processing method
WO2021044241A1 (en) Deep neural network on field-programmable gate array
KR102167747B1 (en) Apparatus and Method of managing Mobile device memory for analyzing a user utilization pattern by a neural network algorithm to predict a next application
JP7059214B2 (en) Arithmetic logic unit
JP2001229217A (en) Higher-order synthesizing method and recording medium used for its implementation
CN110968404B (en) Equipment data processing method and device
JP4968478B2 (en) Method for reconstructing a statement and computer system having the function
CN114970866B (en) Quantum computing task computing method, device and readable storage medium
Masadeh et al. A quality-assured approximate hardware accelerators–based on machine learning and dynamic partial reconfiguration
WO2023170855A1 (en) Neural network device
Perepelitsyn et al. Technological Stack for Implementation of AI as a Service based on Hardware Accelerators
Eassa et al. RISC-V based implementation of Programmable Logic Controller on FPGA for Industry 4.0
US20190057125A1 (en) System and method for managing log data
EP4066146A1 (en) Systems and methods for implementing operational transformations for restricted computations of a mixed-signal integrated circuit
Wang et al. Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation
US6233732B1 (en) Compiling system using intermediate codes to store a plurality of values
JP3370304B2 (en) High-level synthesis system, high-level synthesis method, and recording medium used for implementing high-level synthesis method
US20230177351A1 (en) Accelerating decision tree inferences based on tensor operations
US11347490B1 (en) Compilation framework for hardware configuration generation
JP7042870B2 (en) Methods, devices, devices and computer-readable storage media performed by computing devices
EP3997593B1 (en) A streaming compiler for automatic adjoint differentiation

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022543749

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22930836

Country of ref document: EP

Kind code of ref document: A1