WO2021240633A1 - 情報処理回路および情報処理回路の設計方法 - Google Patents

情報処理回路および情報処理回路の設計方法 Download PDF

Info

Publication number
WO2021240633A1
WO2021240633A1 PCT/JP2020/020701 JP2020020701W WO2021240633A1 WO 2021240633 A1 WO2021240633 A1 WO 2021240633A1 JP 2020020701 W JP2020020701 W JP 2020020701W WO 2021240633 A1 WO2021240633 A1 WO 2021240633A1
Authority
WO
WIPO (PCT)
Prior art keywords
circuit
parameter
parameter value
value output
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/020701
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
崇 竹中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2022527306A priority Critical patent/JP7456501B2/ja
Priority to US17/926,728 priority patent/US20230205957A1/en
Priority to PCT/JP2020/020701 priority patent/WO2021240633A1/ja
Priority to TW110114833A priority patent/TWI841838B/zh
Publication of WO2021240633A1 publication Critical patent/WO2021240633A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Definitions

  • the present invention relates to an information processing circuit that executes an inference phase of deep learning, and a method for designing such an information processing circuit.
  • Deep learning is an algorithm that uses a multi-layered neural network (hereinafter referred to as a network).
  • a learning phase in which each network (layer) is optimized to create a model (learning model) and an inference phase in which inference is performed based on the learning model are executed.
  • the model is sometimes called an inference model.
  • the model may be referred to as an inference device below.
  • an inference device realized by GPU Graphics Processing Unit
  • CPU Central Processing Unit
  • an accelerator dedicated to deep learning has been put into practical use.
  • FIG. 20 is an explanatory diagram showing the structure of VGG (Visual Geometry Group) -16, which is an example of a convolutional neural network (CNN).
  • VGG-16 includes 13 convolutional layers and 3 fully connected layers. Features extracted in the convolutional layer, or in the convolutional layer and the pooling layer, are classified as fully connected layers.
  • the convolution layer is a 3 ⁇ 3 convolution. Therefore, for example, the first convolution operation in FIG. 20 includes a product-sum operation of 3 (vertical size) ⁇ 3 (horizontal size) ⁇ 3 (input channel) ⁇ 64 (output channel) per pixel. Further, for example, the convolution layer of the fifth block in FIG. 20 includes a product-sum operation of 3 (vertical size) ⁇ 3 (horizontal size) ⁇ 512 (input channel) ⁇ 512 (output channel) per pixel. “P” indicates a pooling layer. In the CNN shown in FIG. 20, the pooling layer is a Max Pooling layer.
  • F indicates a fully connected layer.
  • O indicates an output layer.
  • the convolution layer and the fully connected layer include a normalized linear unit (ReLU).
  • the multiplication formula attached to each layer represents the vertical size ⁇ horizontal size ⁇ number of channels of the data corresponding to one input image. Also, the volume of the rectangular parallelepiped representing the layer corresponds to the amount of activation in the layer.
  • the CNN is configured such that the operations of the plurality of layers constituting the CNN are performed by a common arithmetic unit (see, for example, paragraph 0033 of Patent Document 1). ).
  • the arithmetic unit 700 When the operation of each layer of the inferior is executed, the arithmetic unit 700 reads the parameters for the layer to be executed from the DRAM 900. Then, the arithmetic unit 700 executes the product-sum operation in one layer with the parameter as a coefficient.
  • FIG. 22 is an explanatory diagram schematically showing a CNN provided with an arithmetic unit corresponding to each layer.
  • FIG. 22 illustrates six layers 801,802,803,804,805,806 in CNN.
  • Arithmetic units (circuits) 701, 702, 703, 704, 705, 706 corresponding to each of the layers 801, 802, 803, 804, 805, 806 are provided.
  • the CNN function is executed without changing the circuit configuration of the arithmetic units 701 to 706 even if the parameters are changed.
  • the data transfer speed of the DRAM 900 is lower than the calculation speed of the calculator 700. That is, the memory band of the DRAM 900 is narrow. Therefore, the data transfer between the arithmetic unit 700 and the memory becomes a bottleneck. As a result, the calculation speed of CNN is limited.
  • the circuit scale of the adder and the multiplier as a whole CNN becomes smaller.
  • the circuit since the circuit is configured so as to be able to perform completely parallel processing for each layer (fully-parallel), such a circuit configuration increases the circuit scale. Since the circuit is configured to process the operations corresponding to each input channel and each output channel in parallel for each layer, the circuit scale is increased by such a circuit configuration. Further, since the circuit is configured so that completely parallel processing is possible for each layer, it is desirable that the processing time of the input data corresponding to one image is the same for each layer.
  • the higher the layer the layer closer to the output layer
  • the smaller the vertical size and horizontal size of the input data corresponding to one image may be.
  • the pooling layer reduces the vertical and horizontal sizes of the input data corresponding to one image. If each layer processes the data corresponding to one input image in the same time, the amount of calculation in the previous layer is small unless the number of channels in the previous layer is extremely increased. In other words, the higher the layer, the smaller the circuit scale for executing the operations of that layer.
  • the configuration of the CNN described in Non-Patent Document 1 is such that the CNN is divided into two stages, and an arithmetic unit corresponding to each layer is provided in the previous stage. Then, the stage in the subsequent stage is configured so that the parameters are transferred to the DRAM and a programmable accelerator is used as the arithmetic unit. That is, it is described in Non-Patent Document 1 that the CNN is configured to respond to changes in parameters and network configurations to some extent, and the parameters and network configurations are fixed as the CNN as a whole, that is, as the inferior as a whole. It has not been.
  • the parameters and network structure of the inference device as a whole are fixed, it is difficult to change the network structure and weights (parameters) of deep learning once the circuit is manufactured.
  • a circuit manufactured as a face recognition chip can only be used for face recognition. That is, it is difficult for a circuit with fixed parameters and network structure to correspond to other types of CNN.
  • the present invention provides an information processing circuit and a method for designing an information processing circuit that, when the inference device is realized by hardware, is released from the limitation of the memory bandwidth and the utilization rate of the arithmetic unit of each layer in the inference device is improved.
  • the purpose is to provide.
  • the information processing circuit is an information processing circuit that executes layer operations in deep learning, and is a product-sum circuit that performs product-sum operations using input data and parameter values, and a parameter value that outputs parameter values.
  • the parameter value output circuit including the output circuit, is composed of a combinational circuit, and the first parameter value output circuit manufactured by a method in which the circuit configuration cannot be changed and the second parameter value manufactured by a method in which the circuit configuration can be changed. Includes output circuit.
  • the information processing circuit design method is a method for designing an information processing circuit that generates an information processing circuit that executes layer operations in deep learning, and is a plurality of types of parameter sets including a plurality of learned parameter values. And data that can specify the network structure are input, and a product-sum operation is performed using the input data and parameter values. A product-sum circuit specialized for layers in the network structure is created, and multiple types of parameters are created. As a combination circuit that outputs the parameter values in the set, create a first parameter value output circuit that is realized by a method that cannot change the circuit configuration, and create a second parameter value output circuit that is realized by a method that can change the circuit configuration. ..
  • the information processing circuit design program is a program for generating an information processing circuit that executes layer operations in deep learning, and is a set of a plurality of types of parameters including a plurality of learned parameter values in a computer. And the process of inputting data that can specify the network structure, and the process of creating a sum-of-product circuit that is a circuit that performs product-sum operations using input data and parameter values and is specialized for layers in the network structure.
  • a combination circuit that outputs parameter values in multiple types of parameter sets the process of creating the first parameter value output circuit realized by a method that cannot change the circuit configuration and the second parameter realized by a method that can change the circuit configuration.
  • the process of creating a value output circuit is executed.
  • the information processing circuit design apparatus uses an input means for inputting a plurality of types of parameter sets including a plurality of learned parameter values and data capable of specifying a network structure, and a product using the input data and the parameter values.
  • the parameter value output circuit creating means is realized by the first parameter value output circuit creating means for creating the first parameter value output circuit realized by the method in which the circuit configuration cannot be changed and the method in which the circuit configuration can be changed.
  • the second parameter value output circuit creating means for creating the second parameter value output circuit is included.
  • the information processing circuit is a CNN inference device provided with an arithmetic unit corresponding to each layer of the CNN.
  • the parameters are fixed, and the network configuration (type of deep learning algorithm, what type of layer is arranged in what order, size of input data and size of output data of each layer, etc. ) Is fixed to realize a CNN inferior.
  • the information processing circuit is a circuit having a circuit configuration specialized for each layer of the CNN (for example, each of the convolution layer and the fully coupled layer). Specializing means that it is a dedicated circuit that exclusively executes the operations of the relevant layer.
  • the fixed parameters mean that the processing of the learning phase is completed, the appropriate parameters are determined, and the determined parameters are used.
  • the parameters determined in the learning phase may be changed.
  • changing a parameter may be expressed as optimizing the parameter.
  • the degree of parallelism is determined in consideration of the data input speed, the processing speed, and the like.
  • the multiplier of the parameter (weight) and the input data in the inference device is composed of a combination logic circuit (combination circuit). Alternatively, it may be composed of a pipeline arithmetic unit. Alternatively, it may be configured by a sequential circuit.
  • FIG. 1 is an explanatory diagram schematically showing an information processing circuit according to the first embodiment.
  • FIG. 1 illustrates the arithmetic units 201, 202, 203, 204, 205, 206 in the information processing circuit 100 that realizes CNN. That is, FIG. 1 illustrates 6 layers of CNN.
  • Each arithmetic unit 201, 202, 203, 204, 205, 206 executes a product-sum operation for the parameters 211,212,213,214,215,216 used in the layer and the input data.
  • the arithmetic units 201 to 206 are realized by a plurality of combinational circuits.
  • the parameters 211 to 216 are also realized by the combinational circuit.
  • the combinational circuit is a negative logic product circuit (NAND circuit), a NOR circuit, a negative circuit (inverter circuit: NOT circuit), and a combination thereof.
  • NAND circuit negative logic product circuit
  • NOR circuit negative logic product circuit
  • NOT circuit negative circuit
  • one circuit element may be expressed as a combinational circuit, but a circuit including a plurality of circuit elements (NAND circuit, NOR circuit, NOT circuit, etc.) may be expressed as a combinational circuit.
  • parallel operations are executed in each of the arithmetic units 201 to 206, but a circuit that executes one operation in the parallel operations is used as a basic circuit.
  • the basic circuit is predetermined according to the type of layer.
  • the basic circuit 300 includes a product-sum circuit 301 that multiplies the input data and the parameter values from the parameter table (weight table) 302 and adds the multiplied values.
  • the input data may be one value. Further, the input data may be a set of a plurality of values.
  • FIG. 2 shows a parameter table 302 for storing parameter values, but in reality, the parameter values are not stored in the storage unit (storage circuit), and the parameter table 302 is combined. It is realized by the circuit.
  • the parameter table 302 since the parameters are fixed, the parameter values, which are fixed values, are output from the parameter table 302.
  • the parameter table 302 may output one value. Further, the parameter table 302 may output a set of a plurality of values.
  • the product-sum circuit 301 may multiply one input value and one parameter value. Further, the product-sum circuit 301 may multiply the set of input values and the set of parameter values. You may calculate the aggregate sum of the set of the multiplication result of the set of the input value and the set of the parameter value. In general, a plurality of parameters or a plurality of sets of parameters are used for one layer, and the control unit 400 controls which parameter is output.
  • the basic circuit 300 may include a register 303 that temporarily stores the product-sum operation value.
  • the product-sum circuit 301 may include an adder that adds a plurality of multiplication values temporarily stored in the register 303.
  • the output of another basic circuit 300 may be connected to the input of the basic circuit 300.
  • FIG. 3 is an explanatory diagram for explaining a circuit configuration example of the parameter table 302.
  • FIG. 3A shows an example of the truth table 311.
  • the truth table 311 can be realized by the combinational circuit.
  • Each of A, B, and C is an input of a combinational circuit.
  • Z1 and Z2 are outputs of the combinational circuit.
  • FIG. 3A shows the truth table 311 of the full adder as an example, A, B, and C can be regarded as addresses, and Z1 and Z2 can be regarded as output data. That is, Z1 and Z2 can be regarded as output data for the designated addresses A, B, and C.
  • By associating the output data with the parameter value a desired parameter value can be obtained according to some input (designated address).
  • the desired parameter value can be determined regardless of a specific input value (A in the truth table 311)
  • it is simplified to determine the parameter value by the inputs B and C in the truth table 311. All you have to do is use the truth table 312.
  • the circuit scale of the combinational circuit becomes smaller as the number of different types of inputs for determining the parameters decreases.
  • a known technique such as the Quine-McCluskey method is used to simplify the truth table.
  • the arithmetic unit 203 shown in FIG. 2 includes a control unit 400.
  • the control unit 400 controls the data of the designated address corresponding to the output data at a desired timing. Is supplied to the parameter table 302.
  • the parameter table 302 outputs the output data corresponding to the designated address, that is, the parameter value to the product sum circuit 301.
  • the desired timing is when the product-sum circuit 301 executes the multiplication process using the parameter values to be output from the parameter table 302.
  • FIG. 4 is a block diagram showing an example of an information processing circuit design device for designing the circuit configuration of the parameter table of each layer of the CNN and the circuit configuration of the arithmetic unit.
  • the information processing circuit design device 500 includes a parameter table optimization unit 501, a parameter table generation unit 502, a parallel degree determination unit 503, and an arithmetic unit generation unit 504.
  • the parallelism determination unit 503 inputs a network structure (specifically, data indicating the network structure).
  • the arithmetic unit generation unit 504 outputs the circuit configuration of the arithmetic unit for each layer.
  • the parameter table optimization unit 501 inputs the parameter set (weight in each layer) learned in the learning phase and the degree of parallelism determined by the parallel degree determination unit 503.
  • the parameter table generation unit 502 outputs the circuit configuration of the parameter table.
  • the parallel degree determination unit 503 determines the degree of parallelism for each layer.
  • the parameter table optimization unit 501 optimizes the parameter table based on the input parameters for each layer and the degree of parallelism for each layer determined by the parallel degree determination unit 503.
  • the number of parameter tables is determined by the degree of parallelism, but the parameter table optimization unit 501 optimizes each parameter in the plurality of parameter tables 302.
  • optimization means reducing the circuit area of the combinational circuit corresponding to the parameter table.
  • the degree of parallelism is determined to be "128”
  • the number of basic circuits 300 is 128.
  • Each basic circuit 300 is a product of 1152 pieces.
  • the process for the operation (147,456 / 128) is executed. In that case, the basic circuit 300 is provided with only 128 parameter tables having the parameter values of 1152.
  • the parameter table 302 is stored. It is not realized by a circuit, but by a combination circuit.
  • the parameter table optimization unit 501 optimizes the parameter values of the parameter table 302 by using a predetermined method.
  • the parameter table generation unit 502 outputs the circuit configuration for realizing the parameter table 302 having the optimized parameter values as the circuit configuration of the parameter table.
  • the parameter table optimization unit 501 inputs the parameter set (plural parameter values) learned in the learning phase, and the parallelism determination unit 503 inputs data indicating a predetermined network structure (step S11). ..
  • inputting data indicating the network structure is expressed as inputting the network structure.
  • the parallel degree determination unit 503 determines the parallel degree for each layer (step S12). As an example, the parallel degree determination unit 503 determines the parallel degree N by the equation (1). For example, when the number of layers specified by the input type of deep learning algorithm is 19, the parallel degree determination unit 503 determines the degree of parallelism of each of the 19 layers.
  • N C L / DL ... (1)
  • C L denotes a number of clocks required for all pixels of one screen in the parallel degree determination target layer (target layer) to process a single multiplier-adder.
  • D L denotes the number of clocks required for one display of the processor in the target layer (the number of clocks allowed).
  • one screen processes one vertical and horizontal pixel in one clock in a layer (referred to as a layer in the first block) having a vertical size of 224 and a horizontal size of 224 (50,176 pixels). It is assumed that the entire screen is executed with 50,176 clocks.
  • the processing of one vertical and horizontal pixel is performed with 256 clocks. If executed, the processing for one screen can be completed in 50, 176 clocks, which is the same as the first clock.
  • the degree of parallelism of the layer of the fifth block is 9,216.
  • the arithmetic unit of each layer (specifically, a plurality of basic circuits 300 included in the arithmetic unit) can be kept in operation at all times. In the configuration shown in FIG. 22, when no device is applied to the arithmetic units 701 to 706, the operating rate of the arithmetic unit 706 is lower than the operating rate of the arithmetic unit 701.
  • Non-Patent Document 1 Taking the configuration described in Non-Patent Document 1 as an example, since each layer is fully-parallel, the operating rate of the arithmetic unit is lower in the layer close to the output layer. However, in the present embodiment, it is possible to maintain a high operating rate of the arithmetic units of all layers.
  • the parameter table optimization unit 501 generates the parameter table 302 for each layer according to the determined degree of parallelism (step S13). Further, the parameter table optimization unit 501 optimizes the generated parameter table 302 (step S14).
  • FIG. 6 is a flowchart showing an example of the process of optimizing the parameter table 302 (parameter table optimization process).
  • the parameter table optimization unit 501 confirms whether or not the recognition accuracy is equal to or higher than the first reference value (step S142).
  • the first reference value is a predetermined threshold value.
  • the parameter table optimization unit 501 estimates the circuit area of the parameter table 302. Then, it is confirmed whether or not the circuit area of the parameter table 302 is equal to or less than the second reference value (step S144).
  • the second reference value is a predetermined threshold value.
  • the parameter table optimization unit 501 can estimate the circuit area of the parameter table 302 based on, for example, the number of logic circuits in the combinational circuit constituting the parameter table 302.
  • the parameter table optimization unit 501 ends the parameter table optimization process.
  • step S143 when the recognition accuracy is less than the first reference value, the parameter table optimizing unit 501 changes the parameter value in the direction in which the recognition accuracy is expected to improve.
  • the parameter table optimization unit 501 may change the parameter value by cut and try.
  • step S143 when the circuit area of the parameter table 302 exceeds the second reference value, the parameter table optimization unit 501 changes the parameter value so that the circuit area of the parameter table 302 becomes smaller.
  • a method of changing the parameter value for reducing the circuit area of the parameter table 302 for example, there are the following methods.
  • the parameter value whose absolute value is smaller than the predetermined threshold value is changed to 0.
  • the parameter value (positive number) larger than the predetermined threshold value is replaced with the maximum parameter value in the parameter table 302.
  • -Replace the parameter value (negative number) smaller than the predetermined threshold value with the minimum parameter value in the parameter table 302. Set a representative value for each predetermined area in the parameter table 302, and replace all the parameter values in the area with the representative value.
  • typical values are even-numbered values, odd-numbered values, mode values, and the like.
  • the parameter table optimization unit 501 may use one of the above-mentioned plurality of methods, or may use two or more of the above-mentioned plurality of methods in combination.
  • FIG. 7 is an explanatory diagram showing an example of a method of changing a parameter value.
  • FIG. 7 illustrates a parameter table having a size of 3 ⁇ 3.
  • FIG. 7A shows the parameter table 302a before the parameter value is changed.
  • FIG. 7B shows the parameter table 302b after the parameter values have been changed.
  • the purpose common to each of the above methods is to make the same value appear frequently in the parameter table 302, that is, to increase the parameter value of the same value or to make the same pattern continuous.
  • the meaning that the same pattern is continuous means that, for example, the patterns of the parameter values "1", "2", and "3" (an example of the same pattern) appear continuously.
  • the smaller the types of parameter values the smaller the circuit scale of the combinational circuit. Further, even when the same pattern is continuous, it is expected that the circuit scale of the combinational circuit will be smaller.
  • the recognition accuracy of the inference device is equal to or higher than a desired level (specifically, equal to or higher than the first reference value), and the circuit area is equal to or lower than a desired size (specifically). Specifically, when the value is equal to or less than the second reference value), the parameter table optimization process is terminated.
  • the arithmetic unit generation unit 504 generates and outputs the circuit configuration of the arithmetic unit for each layer (steps S15 and S17). That is, the arithmetic unit generation unit 504 outputs the circuit configuration of the arithmetic unit according to the parallelism degree for each layer determined by the parallelism degree determination unit 503.
  • the arithmetic unit generation unit 504 since the basic circuit 300 of each layer is predetermined, the arithmetic unit generation unit 504 has a number of basic circuits 300 (specifically, the number of basic circuits 300 determined by the parallel degree determination unit 503 according to the degree of parallelism). , A product-sum circuit 301) specialized for layers is generated.
  • the parameter table generation unit 502 generates and outputs the circuit configuration of the parameter table 302 (steps S16 and S17). That is, the parameter table generation unit 502 generates and outputs a circuit configuration for outputting the parameter value optimized by the parameter table optimization unit 501.
  • the circuit configuration for outputting the parameter value is, for example, the configuration of a combination circuit that realizes the truth table as illustrated in FIG. 3 (B).
  • the parallel degree determination unit 503 determines an appropriate degree of parallelism, so that the effect of reducing the circuit scale can be obtained. can.
  • the parameter table 302 is realized by the combinational circuit, it is configured to read the parameter value shown in FIG. 21 from the memory.
  • the processing speed is improved compared to the information processing circuit.
  • the degree of parallelism of each layer in the inference device is determined according to the calculation speed desired for that layer, the operation of the calculation unit of all layers is compared with the case where each layer is fully-parallel. The rate can be kept high.
  • the circuit scale is smaller than that in the case where each layer is fully-parallel. As a result, the power consumption of the inferior is reduced.
  • the circuit scale of the inference device can be made smaller.
  • the information processing circuit has been described by taking a CNN inference device as an example, but the present embodiment may be applied to another network having a layer for performing an operation using input data and a parameter value. can.
  • the image data is used as the input data, but the present embodiment can also be utilized in a network in which the input data is other than the image data.
  • the power consumption of the data center is large, it is desirable to execute it with low power consumption when the deep learning algorithm is executed in the data center.
  • the power consumption is reduced, so that the information processing circuit of the present embodiment can be effectively utilized in the data center.
  • the information processing circuit of this embodiment can be effectively used even on the edge side.
  • the first parameter (hereinafter referred to as the first parameter) is a parameter commonly used for each purpose of deep learning.
  • the second parameter (hereinafter referred to as the second parameter) is a parameter used individually for each application.
  • FIG. 8 the arithmetic units 201, 202, 203, 204, 205, and 206 in the information processing circuit 101 using two types of parameters are exemplified. That is, FIG. 8 illustrates 6 layers of CNNs using two types of parameters.
  • Each arithmetic unit 201, 202, 203, 204, 205, 206 has a first parameter 221,222,223,224,225,226 and a second parameter 231,232,233,234,235,236 used in the layer.
  • the product-sum operation is executed for the input data and the input data.
  • the arithmetic units 201 to 206 are realized by a plurality of combinational circuits.
  • the first parameters 221 to 226 and the second parameters 231 to 236 are also realized by a plurality of combinational circuits.
  • the difference in the circuit configuration from the information processing circuit 100 of the first embodiment is that there are circuits constituting the first parameter and the second parameter, respectively.
  • the second parameter output circuit Since the circuit configuration of the second parameter output circuit can be changed, the information stored at the time of manufacture is arbitrary. At the time of manufacture, the second parameter output circuit may not hold individual information, and may hold any parameter according to the application. Since the circuit of the second parameter output circuit is adjusted (updated) according to the application, in FIG. 8, the second parameters 231 to 236 are surrounded by a dotted line to show the state.
  • parallel arithmetic is executed in each of the arithmetic units 201 to 206.
  • a circuit that executes one operation in parallel operation is used as a basic circuit. Further, the basic circuit is predetermined according to the type of layer.
  • FIG. 9 is an explanatory diagram showing a configuration example of the basic circuit of the information processing circuit of the second embodiment.
  • FIG. 9 illustrates the arithmetic units (circuits) 201, 202, 203, 204, 205, and 206 of the six layers, respectively. In each layer, a basic circuit 310 for the number of parallel processes is provided. Although FIG. 9 illustrates the basic circuit 310 included in the arithmetic unit 203, the arithmetic units 201, 202, 204, 205, and 206 of other layers also have the same circuit configuration.
  • the basic circuit 310 includes a product-sum circuit 301, a register 303, a first parameter table 304, and a second parameter table 305.
  • the product-sum circuit 301 is a circuit that multiplies the input data with the parameter values from the first parameter table 304 and the second parameter table 305 and adds the multiplied values, as in the first embodiment. It should be noted that not all the basic circuits 310 need to have the same configuration, and for example, one or more of the plurality of basic circuits 310 includes the product-sum circuit 301, the first parameter table 304, and the second parameter table 305. May be.
  • the first parameter table 304 corresponds to the above-mentioned first parameter output circuit, and is a table that stores parameters commonly used in each application for deep learning using the information processing circuit 101 of the present embodiment.
  • the second parameter table 305 corresponds to the above-mentioned second parameter output circuit and is a table for storing parameters individually used for each application.
  • the basic circuit 310 multiplies the input data by the parameter values from the first parameter table 304 and the second parameter table 305, and adds the multiplied values to the product-sum circuit 301.
  • FIG. 9 illustrates the first parameter table 304 and the second parameter table 305 that store the parameter values.
  • the first parameter table 304 and the second parameter table 305 are not stored in the storage unit (storage circuit) as in the parameter table 302 of the first embodiment, but are realized by a combination circuit. Will be done.
  • the arithmetic unit 203 shown in FIG. 9 includes a control unit 400.
  • the control unit 400 When the parameter values in the first parameter table 304 and the second parameter table 305 are realized as output data corresponding to the designated address as in the first embodiment, the control unit 400 outputs at a desired timing.
  • the data of the designated address corresponding to the data is supplied to the first parameter table 304.
  • the second parameter table 305 via the first parameter table 304 outputs the output data corresponding to the designated address, that is, the parameter value to the product sum circuit 301.
  • the desired timing is when the product-sum circuit 301 executes the multiplication process using the parameter values to be output from the first parameter table 304 and the second parameter table 305.
  • FIG. 10 is a block diagram showing an example of an information processing circuit design device for designing the circuit configuration of the first parameter table and the second parameter table in each layer of the CNN and the circuit configuration of the arithmetic unit.
  • the information processing circuit design device 510 includes a parameter table optimization unit 511, a parameter table generation unit 512, a parallel degree determination unit 513, an arithmetic unit generation unit 514, and a parameter table division unit 515.
  • the configuration is the same as that of the information processing circuit design device 500 of the first embodiment, except that the parameter table division unit 515 exists in the information processing circuit design device 510.
  • the parallelism determination unit 513 inputs a network structure (specifically, data indicating the network structure).
  • the arithmetic unit generation unit 514 outputs the circuit configuration of the arithmetic unit for each layer.
  • the parameter table optimization unit 511 inputs a plurality of parameter sets (weights in each layer) learned in the learning phase and the degree of parallelism determined by the parallel degree determination unit 513.
  • the plurality of parameter sets are specifically neural network parameter sets used for each application.
  • parameter set A is the parameter set used for face detection
  • parameter set B is the parameter set used for vehicle detection
  • types of parameter sets are not limited to two types, and may be three or more types.
  • the degree of parallelism determination unit 513 determines the degree of parallelism for each layer.
  • the method of determining the degree of parallelism for each layer by the degree of parallelism determination unit 513 is the same as that of the first embodiment.
  • the parameter table optimization unit 511 optimizes the parameter table for each parameter set based on the input parameters for each layer and the degree of parallelism for each layer determined by the parallel degree determination unit 513. Similar to the first embodiment, the number of parameter tables is determined by the degree of parallelism, and the parameter table optimization unit 511 optimizes each parameter in each parameter table for each parameter set. The optimization method will be described later.
  • the parameter table division unit 515 divides the parameter table of each optimized parameter set into a common part common to each parameter set and an individual part other than the common part.
  • the parameter table division unit 515 is a combination circuit that outputs a parameter value, a combination circuit that calculates a logical operation common to each parameter set (that is, a common unit), and the parameter table input in addition to the above. Create a circuit divided into a combination circuit (that is, an individual part) that inputs the output of the common part and calculates individual logical operations.
  • the parameter table division unit 515 is a combinational circuit that realizes a parameter table common to each parameter set (that is, a common part) as a combinational circuit that outputs parameter values, and a parameter table (that is, a parameter table in each parameter set excluding the common part). That is, a circuit divided into a combinational circuit that realizes the individual part) may be created.
  • the parameter table division unit 515 creates a logical expression representing the parameter table of each parameter set. Then, the parameter table division unit 515 extracts a common logical expression as a common part from the logical expressions of each created parameter set, and extracts the remaining logical expressions (that is, logical expressions that are not common) of each parameter set. It is an individual part.
  • the combinational circuit that realizes this common part corresponds to the above-mentioned first parameter output circuit, and the combinational circuit that realizes the individual part corresponds to the above-mentioned second parameter output circuit. That is, the parameter table division unit 515 realizes the common unit and the individual unit in this way.
  • the arithmetic unit generation unit 514 inputs the degree of parallelism for each layer determined by the parallel degree determination unit 513.
  • the arithmetic unit generation unit 514 generates a circuit configuration in which the number of basic circuits 310 indicated by the degree of parallelism are arranged for each layer. Then, the arithmetic unit generation unit 514 outputs the generated circuit configuration for each layer as the configuration of the arithmetic unit circuit.
  • FIG. 11 is an explanatory diagram showing an example of the process of dividing the parameter table.
  • the circuit that realizes the parameter table optimized for the parameter set A is referred to as the parameter A circuit 3021
  • the circuit that realizes the parameter table optimized for the parameter set B is referred to as the parameter B circuit 3022.
  • the parameter table division unit 515 extracts the A circuit (A-1) and the B circuit (B-1) as common parts from the parameter A circuit 3021 and the parameter B circuit 3022, respectively. As a result, the A circuit (A-2) and the B circuit (B-2) become individual parts of each parameter set. Then, in the parameter table dividing unit 515, the A circuit (A-1) and the B circuit (B-1) are set as the first parameter table 304 (common part), and the A circuit (A-2) or the B circuit (B-2) is used. ) Is the second parameter table 305 (individual part).
  • FIG. 12 is a flowchart showing the operation of the information processing circuit design device 510 of the second embodiment.
  • the parameter table optimization unit 511 inputs a plurality of types of parameter sets (multiple parameter values) for face detection, vehicle detection, etc. learned in the learning phase, and the parallel degree determination unit 513 is predetermined. Data indicating the network structure is input (step S21).
  • the parallel degree determination unit 503 determines the parallel degree for each layer (step S22).
  • the method for determining the degree of parallelism is the same as the method used in the first embodiment.
  • the parameter table optimization unit 511 generates a parameter table for each layer according to the determined degree of parallelism (step S23). Further, the parameter table optimization unit 511 optimizes the generated parameter table (step S24), and divides the optimized parameter set into two (that is, a common unit and an individual unit) (step S25).
  • FIG. 13 is a flowchart showing an example of the process of optimizing the parameter table (parameter value change process).
  • the parameter table dividing unit 515 divides the parameter table for each parameter set into a common part and an individual part, respectively (step S251). However, if there is no common part, the parameter table division part 515 does not divide. Further, the parameter table optimization unit 511 measures the recognition accuracy of the CNN (inference device) using a plurality of types of parameter sets (for example, parameter set A and parameter set B) (step S252). Then, the parameter table optimization unit 511 determines whether or not the recognition accuracy of the CNN using each parameter set is equal to or higher than the reference value (hereinafter, may be referred to as an accuracy reference value) (step S253).
  • the reference value hereinafter, may be referred to as an accuracy reference value
  • step S253 If each recognition accuracy is equal to or higher than the reference value (accuracy reference value) (Yes in step S253), the process proceeds to step S254.
  • the first reference value and the second reference value are predetermined threshold values.
  • the process proceeds to step S255.
  • the parameter table optimization unit 511 changes the parameter value (specifically, at least one of the first parameter value and the second parameter value) in the parameter table. For example, when the circuit area occupied by the common unit is less than the third reference value, the parameter table optimization unit 511 changes the parameter value of the parameter table in the direction in which the circuit area is expected to improve. When the direction in which the circuit area is expected to improve is unknown, the parameter table optimization unit 511 may change the parameter value by cut and try.
  • the parameter table optimization unit 511 repeats the processes of step S252 and below. That is, the parameter table optimization unit 511 repeatedly changes at least one of the first parameter value and the second parameter value. If the recognition accuracy does not reach the reference value as a result of changing the parameter value in step S255, the parameter table optimization unit 511 returns the value changed in step S255 to the original value before the change. You may. If the recognition accuracy and the circuit area do not reach the reference value no matter how many times the parameter value is changed, the parameter table optimization unit 511 determines the parameter when the number of times the parameter value is changed reaches the limit. The value change process may be terminated.
  • FIG. 14 is an explanatory diagram showing an example of a truth table before changing the parameter value.
  • FIG. 14 shows an example of the truth table of the circuit A (parameter A circuit 3021) before the parameter change and the truth table of the circuit B (parameter B circuit 3022) before the parameter change.
  • each of the designated addresses A, B, and C is an input of the combinational circuit
  • the parameter Z1 is an output of the combinational circuit. That is, Z1 can be regarded as output data for the designated addresses A, B, and C.
  • each of the designated addresses A, B, and C is an input of the combinational circuit
  • the parameter Z2 is an output of the combinational circuit.
  • Z2 can be regarded as output data for the designated addresses A, B, and C.
  • the output parameter (Z1) is represented by a logical expression: A & (NOT B) & C
  • the output parameter (Z2) is represented by a logical expression: A & (NOT B) & C
  • a & B & C, Z2 D
  • FIG. 15 is an explanatory diagram showing an example of a truth table after changing the parameter values illustrated in FIG. 14.
  • FIG. 15 shows an example of the truth table of the circuit A after the parameter change (parameter A circuit 3021) and the truth table of the circuit B after the parameter change (parameter B circuit 3022). Specifically, in the truth table 412, the parameter value (Z2) in the last line shown by the underline is changed from 0 to 1.
  • the output parameter (Z1) is represented by a logical expression: A & (NOT B) & C
  • the output parameter (Z2) is represented by a logical expression: A & (NOT B) & C
  • a & B & C is common to each parameter table.
  • the common part (D) is larger after the change by the amount of the circuit that realizes A & B & C than before the change.
  • the individual part (Z1) is smaller after the change by the amount of the circuit that realizes A & B & C than before the change. That is, after the change, the area ratio of the common part is larger and the area ratio of the individual part is smaller than before the change. Therefore, it can be said that the area efficiency is better after the change.
  • the arithmetic unit generation unit 514 generates and outputs the circuit configuration of the arithmetic unit for each layer (steps S26 and S29). That is, the arithmetic unit generation unit 514 outputs the circuit configuration of the arithmetic unit according to the parallelism degree for each layer determined by the parallelism degree determination unit 513.
  • the arithmetic unit generation unit 514 since the basic circuit 310 of each layer is predetermined, the arithmetic unit generation unit 514 has a number of basic circuits 310 (specifically, the number of basic circuits 310 according to the degree of parallelism determined by the parallel degree determination unit 513). , A product-sum circuit 301) specialized for layers is generated.
  • the parameter table generation unit 512 generates and outputs the circuit configurations of the first parameter table 304 and the second parameter table 305 (steps S27, S28, S29). That is, the parameter table generation unit 512 generates and outputs a circuit configuration for the parameter table optimization unit 511 to optimize and the parameter table division unit 515 to output the divided parameter values.
  • the circuit configuration for outputting the parameter value is the configuration of the first parameter table circuit and the second parameter table circuit.
  • steps S24 to S28 are sequentially executed, but the processes of steps S25, S27, and S28 and the processes of step S26 can be executed in parallel.
  • the first parameter table circuit and the second parameter table circuit are generated by the combinational circuit.
  • the first parameter table circuit is manufactured by a method in which the circuit configuration cannot be changed after manufacturing, for example, a cell-based method.
  • the second parameter table circuit is manufactured by a method in which the circuit configuration can be changed after manufacturing, for example, a gate array method or an FPGA method.
  • FIG. 16 is an explanatory diagram showing the characteristics of the circuit after manufacturing in each manufacturing method.
  • the circuit manufactured by the cell-based method it is “impossible” to change the circuit after manufacturing, but the circuit area can be reduced.
  • the circuit manufactured by the FPGA method it is “possible” to change the circuit after manufacturing, but the circuit area becomes large.
  • the circuit area is larger than that of the cell-based method.
  • it is desirable that the circuit is manufactured by the cell-based method.
  • it is desirable that the circuit is manufactured by the gate array method or the FPGA method.
  • the information processing circuit of the present embodiment has a circuit configuration and a first parameter value output circuit manufactured by a method (for example, a cell-based method) in which the circuit configuration cannot be changed as a parameter value output circuit composed of a combinational circuit.
  • a method for example, a cell-based method
  • the first parameter value output circuit is manufactured based on the first parameter table 304, and the logical operation used in any of the plurality of types of parameter sets used in the CNN is calculated.
  • the first parameter value to be output is output.
  • the second parameter value output circuit is manufactured by a method that allows the circuit configuration to be changed after manufacturing, and is adjusted based on the second parameter table 305 after manufacturing, and is output from the first parameter value output circuit in addition to the input of the parameter table. Is input to output the second parameter value for calculating individual logical operations. In this way, by fixing the parameters that can be shared and making it possible to change the parameters that are used individually for each application, it is possible to update the weights (parameters) while maintaining the area efficiency.
  • the inferior as the information processing circuit of the present embodiment includes a product-sum circuit that performs a product-sum operation using input data and a parameter value, and a parameter value output circuit that outputs a parameter value.
  • the parameter value output circuit is composed of a combinational circuit, and includes a first parameter value output circuit manufactured by a method in which the circuit configuration cannot be changed and a second parameter value output circuit manufactured by a method in which the circuit configuration can be changed. include.
  • the inference device of the present embodiment can update the weight (parameter) while maintaining the area efficiency.
  • the output of the first parameter value output circuit is input in addition to the input of the parameter table, and the calculation result of the individual logical operation is obtained by the second parameter value output circuit. Can be adjusted to output.
  • Each component in the information processing circuit design devices 500 and 510 shown in FIGS. 4 and 10 can be configured by one hardware or one software. Further, each component can be configured by a plurality of hardware or a plurality of software. It is also possible to configure a part of each component with hardware and another part with software.
  • FIG. 17 is a block diagram showing an example of a computer having a CPU.
  • a computer having a processor such as a CPU (Central Processing Unit) and a memory
  • FIG. 17 shows a storage device 1001 and a memory 1002 connected to the CPU 1000.
  • the CPU 1000 realizes each function in the information processing circuit design devices 500 and 510 shown in FIGS. 4 and 10 by executing a process (information processing circuit design process) according to a program stored in the storage device 1001.
  • the computer is a parameter table optimization unit 501, 511, a parameter table generation unit 502, 512, a parallel degree determination unit 503, 513, and an arithmetic unit in the information processing circuit design devices 500 and 510 shown in FIGS. 4 and 10.
  • the functions of the generation unit 504 and 514 and the parameter table division unit 515 are realized.
  • the program may also be stored on various types of temporary computer-readable media (transitory computer readable medium).
  • the temporary computer-readable medium is supplied with a program, for example, via a wired or wireless channel, i.e., via an electrical signal, an optical signal, or an electromagnetic wave.
  • the memory 1002 is realized by, for example, a RAM (RandomAccessMemory), and is a storage means for temporarily storing data when the CPU 1000 executes processing. It is also possible to envision a form in which a program held by the storage device 1001 or a temporary computer-readable medium is transferred to the memory 1002, and the CPU 1000 executes processing based on the program in the memory 1002.
  • a RAM RandomAccessMemory
  • FIG. 18 is a block diagram showing a main part of an information processing circuit.
  • the information processing circuit 10 is an information processing circuit that executes layer operations in deep learning, and is a product-sum circuit 11 that performs a product-sum operation using input data and parameter values (in the embodiment, the product-sum circuit 301).
  • the parameter value output circuit 12 includes a parameter value output circuit 12 (in the embodiment, realized by the first parameter table 304 and the second parameter table 305) that outputs the parameter value.
  • the first parameter value output circuit 13 (in the embodiment, realized by the first parameter table 304), which is composed of a combinational circuit and is manufactured by a method in which the circuit configuration cannot be changed, is manufactured by a method in which the circuit configuration can be changed. It also includes a second parameter value output circuit 14 (in the embodiment, realized by the second parameter table 305).
  • FIG. 19 is a block diagram showing a main part of the information processing circuit design device.
  • the information processing circuit design device 20 is an input means 21 for inputting a plurality of types of parameter sets including a plurality of learned parameter values and data capable of specifying a network structure (in the embodiment, one of the parameter table optimization units 511). It is realized as a part of the unit and the degree of parallelism determination unit 513.), And it is a circuit that performs the product-sum operation using the input data and the parameter value, and creates a product-sum circuit specialized for the layer in the network structure.
  • a parameter value output circuit creating means 23 (in the embodiment) that creates a combination circuit that outputs the parameter values in the arithmetic unit generating means 22 (in the embodiment, realized by the arithmetic unit generating unit 514) and the parameter values in a plurality of types of parameter sets.
  • the parameter value output circuit creating means 23 is realized by a method in which the circuit configuration cannot be changed. 1
  • the first parameter value output circuit creating means 24 (in the embodiment, the parameter table optimization unit 511, the parameter table division unit 515, and the parameter table generation unit 512) for creating the parameter value output circuit, and the circuit configuration.
  • Appendix 1 An information processing circuit that executes layer operations in deep learning.
  • a product-sum circuit that performs a product-sum operation using input data and parameter values, It is equipped with a parameter value output circuit that outputs the parameter value.
  • the parameter value output circuit is composed of a combinational circuit.
  • the first parameter value output circuit manufactured by a method that cannot change the circuit configuration, and
  • Appendix 3 Equipped with a number of basic circuits according to the number of parallel processes, One or more of the plurality of basic circuits is the information processing circuit of Appendix 1 or Appendix 2 including the product-sum circuit, the first parameter value output circuit, and the second parameter value output circuit.
  • a method of designing an information processing circuit that generates an information processing circuit that executes layer operations in deep learning. Enter multiple types of parameter sets including multiple trained parameter values and data that can identify the network structure. Create a product-sum circuit that is specialized for layers in the network structure and is a circuit that performs product-sum operations using input data and parameter values. As a combinational circuit that outputs parameter values in the plurality of types of parameter sets, Create a first parameter value output circuit that can be realized by a method that cannot change the circuit configuration.
  • a method for designing an information processing circuit which comprises creating a second parameter value output circuit realized by a method in which the circuit configuration can be changed.
  • the first parameter value output circuit for calculating the first logical operation used in any type of parameter set is created.
  • the information processing circuit design program is The process of inputting multiple types of parameter sets including multiple trained parameter values and data that can identify the network structure, and A circuit that performs a product-sum operation using input data and parameter values, and a process that creates a product-sum circuit specialized for layers in the network structure.
  • As a combinational circuit that outputs parameter values in the plurality of types of parameter sets The process of creating the first parameter value output circuit realized by a method that cannot change the circuit configuration, and It is characterized by having the processor execute the process of creating a second parameter value output circuit realized by a method that can change the circuit configuration.
  • the information processing circuit design program is Of the plurality of types of parameter sets used in the neural network, the process of creating the first parameter value output circuit for calculating the first logical operation used in any type of parameter set, and The recording medium of Appendix 7 for causing the processor to execute the process of creating the second parameter value output circuit for calculating individual logical operations by inputting the output of the first parameter value output circuit in addition to the input of the parameter table.
  • the first parameter value output circuit creating means calculates the first logical operation used in any of the plurality of types of parameter sets used in the neural network. Create a value output circuit and The second parameter value output circuit creating means creates the second parameter value output circuit by inputting the output of the first parameter value output circuit in addition to the input of the parameter table and calculating individual logic operations.
  • Information processing circuit design device The first parameter value output circuit creating means calculates the first logical operation used in any of the plurality of types of parameter sets used in the neural network. Create a value output circuit and The second parameter value output circuit creating means creates the second parameter value output circuit by inputting the output of the first parameter value output circuit in addition to the input of the parameter table and calculating individual logic operations.
  • Appendix 11 A program for generating an information processing circuit that executes layer operations in deep learning.
  • On the computer The process of inputting multiple types of parameter sets including multiple trained parameter values and data that can identify the network structure, and A circuit that performs a product-sum operation using input data and parameter values, and a process that creates a product-sum circuit specialized for layers in the network structure.
  • As a combinational circuit that outputs parameter values in the plurality of types of parameter sets The process of creating the first parameter value output circuit realized by a method that cannot change the circuit configuration, and An information processing circuit design program for executing the process of creating a second parameter value output circuit realized by a method that can change the circuit configuration.
  • Appendix 12 To the computer Of the plurality of types of parameter sets used in the neural network, the process of creating the first parameter value output circuit for calculating the first logical operation used in any type of parameter set, and Design of the information processing circuit of Appendix 11 to input the output of the first parameter value output circuit in addition to the input of the parameter table and execute the process of creating the second parameter value output circuit for calculating individual logical operations. program.
  • Second parameter value output circuit 100, 101 Information processing circuit 201, 202, 203, 204, 205, 206 Arithmetic unit 211,212,213,214,215,216 Parameter 221,222 , 223,224,225,226 1st parameter 231,232,233,234,235,236 2nd parameter 300,310 Basic circuit 301 Sum of products circuit 302 Parameter table 3021 Parameter A table 3022 Parameter B table 303 Register 304 1st Parameter table 305 Second parameter table 400 Control unit 500, 510 Information processing circuit design device 501,511 Parameter table optimization unit 502,512 Parameter table generation unit 503, 513 Parallel degree determination unit 504, 514 Arithmetic unit generation unit 515 Parameter table Division 1000 CPU 1001 storage device 1002 memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Neurology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Image Analysis (AREA)
PCT/JP2020/020701 2020-05-26 2020-05-26 情報処理回路および情報処理回路の設計方法 Ceased WO2021240633A1 (ja)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022527306A JP7456501B2 (ja) 2020-05-26 2020-05-26 情報処理回路および情報処理回路の設計方法
US17/926,728 US20230205957A1 (en) 2020-05-26 2020-05-26 Information processing circuit and method for designing information processing circuit
PCT/JP2020/020701 WO2021240633A1 (ja) 2020-05-26 2020-05-26 情報処理回路および情報処理回路の設計方法
TW110114833A TWI841838B (zh) 2020-05-26 2021-04-26 資訊處理電路及資訊處理電路之設計方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/020701 WO2021240633A1 (ja) 2020-05-26 2020-05-26 情報処理回路および情報処理回路の設計方法

Publications (1)

Publication Number Publication Date
WO2021240633A1 true WO2021240633A1 (ja) 2021-12-02

Family

ID=78723071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/020701 Ceased WO2021240633A1 (ja) 2020-05-26 2020-05-26 情報処理回路および情報処理回路の設計方法

Country Status (4)

Country Link
US (1) US20230205957A1 (https=)
JP (1) JP7456501B2 (https=)
TW (1) TWI841838B (https=)
WO (1) WO2021240633A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002117389A (ja) * 1990-01-24 2002-04-19 Hitachi Ltd 情報処理装置
JP2018133016A (ja) * 2017-02-17 2018-08-23 株式会社半導体エネルギー研究所 ニューラルネットワークシステム
US20190318232A1 (en) * 2013-10-11 2019-10-17 Hrl Laboratories, Llc Scalable Integrated Circuit with Synaptic Electronics and CMOS integrated Memristors

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6864224B2 (ja) * 2017-01-27 2021-04-28 富士通株式会社 プロセッサ、情報処理装置及びプロセッサの動作方法
CN110036384B (zh) * 2017-09-29 2021-01-05 索尼公司 信息处理设备和信息处理方法
TWI659324B (zh) * 2018-02-14 2019-05-11 倍加科技股份有限公司 電路規劃結果產生方法與系統
US11586907B2 (en) * 2018-02-27 2023-02-21 Stmicroelectronics S.R.L. Arithmetic unit for deep learning acceleration
US11886980B2 (en) * 2019-08-23 2024-01-30 Nvidia Corporation Neural network accelerator using logarithmic-based arithmetic
WO2021084717A1 (ja) * 2019-10-31 2021-05-06 日本電気株式会社 情報処理回路および情報処理回路の設計方法
JP7364026B2 (ja) * 2020-02-14 2023-10-18 日本電気株式会社 情報処理回路
US20230376769A1 (en) * 2022-05-18 2023-11-23 Seyed Alireza GHAFFARI Method and system for training machine learning models using dynamic fixed-point data representations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002117389A (ja) * 1990-01-24 2002-04-19 Hitachi Ltd 情報処理装置
US20190318232A1 (en) * 2013-10-11 2019-10-17 Hrl Laboratories, Llc Scalable Integrated Circuit with Synaptic Electronics and CMOS integrated Memristors
JP2018133016A (ja) * 2017-02-17 2018-08-23 株式会社半導体エネルギー研究所 ニューラルネットワークシステム

Also Published As

Publication number Publication date
TWI841838B (zh) 2024-05-11
US20230205957A1 (en) 2023-06-29
JPWO2021240633A1 (https=) 2021-12-02
JP7456501B2 (ja) 2024-03-27
TW202147162A (zh) 2021-12-16

Similar Documents

Publication Publication Date Title
CN110413255B (zh) 人工神经网络调整方法和装置
US20190339939A1 (en) Operation processing device, information processing device, and information processing method
US20190251436A1 (en) High-speed processing method of neural network and apparatus using the high-speed processing method
KR20210060980A (ko) 상이한 희소 수준을 갖는 신경망 다중 프루닝 장치 및 방법
WO2020003434A1 (ja) 機械学習方法、機械学習装置、及び機械学習プログラム
US20240160827A1 (en) Methods of training deep learning models for optical proximity correction, optical proximity correction methods, and methods of manufacturing semiconductor devices using the same
Syu et al. One-dimensional binary convolutional neural network accelerator design for bearing fault diagnosis
US20240143986A1 (en) Methods and systems for executing a neural network on a neural network accelerator
WO2022126630A1 (zh) 可重构处理器及其上多种神经网络激活函数计算方法
CN110147139A (zh) 计算机执行方法、时脉数据处理系统以及计算机可读取储存媒体
JP7310910B2 (ja) 情報処理回路および情報処理回路の設計方法
WO2021240633A1 (ja) 情報処理回路および情報処理回路の設計方法
JP7475164B2 (ja) 情報処理装置、情報処理方法およびプログラム
EP4495834A1 (en) Method and apparatus with distributed training of neural network
US12554675B2 (en) Synthesizing zero-loss low-power approximate DNN accelerators with large-scale search
Seo et al. Unveiling the Black-Box: Leveraging Explainable AI for FPGA Design Space Optimization
WO2024253730A1 (en) Machine learning for netlist design
Bouzidi et al. Co-optimization of dnn and hardware configurations on edge gpus
CN110097183B (zh) 信息处理方法以及信息处理系统
US12124939B1 (en) Generation of machine-trained network instructions
CN117454932A (zh) 一次性网络架构搜索方法及硬件加速器
JP2021196900A (ja) ニューラルネットワーク演算量削減装置
JP2020190901A (ja) 演算処理装置、演算処理装置の制御プログラム及び演算処理装置の制御方法
US20260004039A1 (en) Integrated circuit floorplan generation using generative artificial intelligence models
US20250370713A1 (en) Electronic device for performing quantization by using multiplier and accumulator, and control method therefor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937572

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022527306

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20937572

Country of ref document: EP

Kind code of ref document: A1